Page 1 of 1
error in gan example
Posted: Mon May 09, 2016 2:58 pm
by Nandan
Hi,
The GaN example crashes with the following error:
==========================================================================
Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Error in routine davcio (115):
error while writing from file "./tmp/gan.epmatwp1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
==========================================================================
Meanwhile the SiC example runs to completion and I did test a GaAs which also completes without error.
Do you have any suggestions for the GaN example. I am running it as is.
Thanks.
Nandan.
Re: error in gan example
Posted: Tue May 10, 2016 10:36 am
by sponce
Dear Nandan,
It look like the error has to do with not being able to write on disk.
Maybe a disk quota exceeded?
Maybe try not to write in a /tmp directory.
Sam
Re: error in gan example
Posted: Tue May 10, 2016 2:21 pm
by Nandan
Dear Samuel,
I suspected that its because of the /tmp directory but that is not the case. If I do not write to /tmp:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from davcio : error # 115
error while writing from file "./gan.epmatwp1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Certainly not disk quota issue. Any other thing that I should check?
An independent test of the gan example on a different machine also does not finish. Although there is no error message. It ends with:
===============================================================
q( 216 ) = ( -0.3333333 0.5773503 -0.3067271 )
BMN calculated
Writing epmatq on .epb files
rank 0 in job 11 east_53774 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
===============================================================
Regards,
Nandan.
Re: error in gan example
Posted: Tue May 10, 2016 5:55 pm
by Nandan
Additional execution information:
Code: Select all
mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8 < scf.in > scf.out 2>&1 &
mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8 < nscf.in > nscf.out 2>&1 &
mpiexec -np 64 ~/source/espresso-5.4.0/EPW/bin/epw.x -npool 64 < epw.in > epw.out 2>&1 &
Which I believe is a consistent set of processors/pools.
Nandan.
Re: error in gan example
Posted: Wed May 11, 2016 2:14 pm
by sponce
Dear Nandan,
EPW only supports k/q-point parallelization. EPW also needs/reads the ground-state wavefunction (produced with the nscf run).
Therefore it should look like:
Code: Select all
mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8 < scf.in > scf.out 2>&1 &
mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 64 < nscf.in > nscf.out 2>&1 &
mpiexec -np 64 ~/source/espresso-5.4.0/EPW/bin/epw.x -npool 64 < epw.in > epw.out 2>&1 &
Best,
Samuel
Re: error in gan example
Posted: Thu May 12, 2016 2:37 pm
by Nandan
Dear Samuel,
Thanks for correcting. I am trying this out and will report soon.
Nandan.
Re: error in gan example
Posted: Thu May 26, 2016 4:08 pm
by Nandan
An update on this issue.
The GaN example works only when epw.x is run with single processor. I have tried
1. mpiexec -np 64 epw.x -npool 64 < epw.in > epw.out
2. mpiexec -np 8 epw.x -npool 8 < epw.in > epw.out
In both cases it crashes with the same error mentioned in my earlier emails.
The only one that works is:
epw.x < epw.in > epw.out
This completes successfully but is obviously slow. I cannot understand why the parallel jobs crash.
AlAs and GaAs parallel jobs complete successfully.
Could this be some compilation issue?
Nandan.
Re: error in gan example
Posted: Thu May 26, 2016 4:40 pm
by sponce
Dear Nandan,
Everytime you change the number of cpu, you need to redo the nscf.in calculation.
So you would need to do
Code: Select all
mpiexec -np 64 pw.x -npool 64 < nscf.in > nscf.out
mpiexec -np 64 epw.x -npool 64 < epw.in > epw.out
If you did that, then it might be a compilation issue. Which mpi lib are you using. Is it one of the tested one
http://epw.org.uk/Main/TestFarm ?
Best,
Samuel
Re: error in gan example
Posted: Fri May 27, 2016 4:28 pm
by Nandan
I did run the nscf with the same number of processors and npools as epw.
I am going to recompile with newer compiler and flags from the TestFarm.
Let me get back to you on this.
Nandan.