Page 1 of 1

error in gan example

Posted: Mon May 09, 2016 2:58 pm
by Nandan
Hi,

The GaN example crashes with the following error:
==========================================================================
Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Error in routine davcio (115):
error while writing from file "./tmp/gan.epmatwp1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
==========================================================================

Meanwhile the SiC example runs to completion and I did test a GaAs which also completes without error.

Do you have any suggestions for the GaN example. I am running it as is.

Thanks.

Nandan.

Re: error in gan example

Posted: Tue May 10, 2016 10:36 am
by sponce
Dear Nandan,

It look like the error has to do with not being able to write on disk.

Maybe a disk quota exceeded?

Maybe try not to write in a /tmp directory.

Sam

Re: error in gan example

Posted: Tue May 10, 2016 2:21 pm
by Nandan
Dear Samuel,

I suspected that its because of the /tmp directory but that is not the case. If I do not write to /tmp:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from davcio : error # 115
error while writing from file "./gan.epmatwp1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Certainly not disk quota issue. Any other thing that I should check?

An independent test of the gan example on a different machine also does not finish. Although there is no error message. It ends with:
===============================================================
q( 216 ) = ( -0.3333333 0.5773503 -0.3067271 )
BMN calculated


Writing epmatq on .epb files

rank 0 in job 11 east_53774 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
===============================================================

Regards,

Nandan.

Re: error in gan example

Posted: Tue May 10, 2016 5:55 pm
by Nandan
Additional execution information:

Code: Select all

mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8  < scf.in > scf.out 2>&1 &

mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8  < nscf.in > nscf.out 2>&1 &

mpiexec -np 64 ~/source/espresso-5.4.0/EPW/bin/epw.x -npool 64  < epw.in > epw.out 2>&1 &


Which I believe is a consistent set of processors/pools.

Nandan.

Re: error in gan example

Posted: Wed May 11, 2016 2:14 pm
by sponce
Dear Nandan,

EPW only supports k/q-point parallelization. EPW also needs/reads the ground-state wavefunction (produced with the nscf run).

Therefore it should look like:

Code: Select all

mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8  < scf.in > scf.out 2>&1 &

mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 64  < nscf.in > nscf.out 2>&1 &

mpiexec -np 64 ~/source/espresso-5.4.0/EPW/bin/epw.x -npool 64  < epw.in > epw.out 2>&1 &


Best,

Samuel

Re: error in gan example

Posted: Thu May 12, 2016 2:37 pm
by Nandan
Dear Samuel,

Thanks for correcting. I am trying this out and will report soon.

Nandan.

Re: error in gan example

Posted: Thu May 26, 2016 4:08 pm
by Nandan
An update on this issue.

The GaN example works only when epw.x is run with single processor. I have tried
1. mpiexec -np 64 epw.x -npool 64 < epw.in > epw.out
2. mpiexec -np 8 epw.x -npool 8 < epw.in > epw.out

In both cases it crashes with the same error mentioned in my earlier emails.


The only one that works is:
epw.x < epw.in > epw.out

This completes successfully but is obviously slow. I cannot understand why the parallel jobs crash.

AlAs and GaAs parallel jobs complete successfully.

Could this be some compilation issue?

Nandan.

Re: error in gan example

Posted: Thu May 26, 2016 4:40 pm
by sponce
Dear Nandan,

Everytime you change the number of cpu, you need to redo the nscf.in calculation.

So you would need to do

Code: Select all

mpiexec -np 64 pw.x -npool 64 < nscf.in > nscf.out
mpiexec -np 64 epw.x -npool 64 < epw.in > epw.out


If you did that, then it might be a compilation issue. Which mpi lib are you using. Is it one of the tested one http://epw.org.uk/Main/TestFarm ?

Best,

Samuel

Re: error in gan example

Posted: Fri May 27, 2016 4:28 pm
by Nandan
I did run the nscf with the same number of processors and npools as epw.

I am going to recompile with newer compiler and flags from the TestFarm.

Let me get back to you on this.

Nandan.