Hi,
The GaN example crashes with the following error:
==========================================================================
Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Error in routine davcio (115):
error while writing from file "./tmp/gan.epmatwp1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
==========================================================================
Meanwhile the SiC example runs to completion and I did test a GaAs which also completes without error.
Do you have any suggestions for the GaN example. I am running it as is.
Thanks.
Nandan.
error in gan example
Moderator: stiwari
Re: error in gan example
Dear Nandan,
It look like the error has to do with not being able to write on disk.
Maybe a disk quota exceeded?
Maybe try not to write in a /tmp directory.
Sam
It look like the error has to do with not being able to write on disk.
Maybe a disk quota exceeded?
Maybe try not to write in a /tmp directory.
Sam
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Re: error in gan example
Dear Samuel,
I suspected that its because of the /tmp directory but that is not the case. If I do not write to /tmp:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from davcio : error # 115
error while writing from file "./gan.epmatwp1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Certainly not disk quota issue. Any other thing that I should check?
An independent test of the gan example on a different machine also does not finish. Although there is no error message. It ends with:
===============================================================
q( 216 ) = ( -0.3333333 0.5773503 -0.3067271 )
BMN calculated
Writing epmatq on .epb files
rank 0 in job 11 east_53774 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
===============================================================
Regards,
Nandan.
I suspected that its because of the /tmp directory but that is not the case. If I do not write to /tmp:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from davcio : error # 115
error while writing from file "./gan.epmatwp1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Certainly not disk quota issue. Any other thing that I should check?
An independent test of the gan example on a different machine also does not finish. Although there is no error message. It ends with:
===============================================================
q( 216 ) = ( -0.3333333 0.5773503 -0.3067271 )
BMN calculated
Writing epmatq on .epb files
rank 0 in job 11 east_53774 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
===============================================================
Regards,
Nandan.
Re: error in gan example
Additional execution information:
Which I believe is a consistent set of processors/pools.
Nandan.
Code: Select all
mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8 < scf.in > scf.out 2>&1 &
mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8 < nscf.in > nscf.out 2>&1 &
mpiexec -np 64 ~/source/espresso-5.4.0/EPW/bin/epw.x -npool 64 < epw.in > epw.out 2>&1 &
Which I believe is a consistent set of processors/pools.
Nandan.
Re: error in gan example
Dear Nandan,
EPW only supports k/q-point parallelization. EPW also needs/reads the ground-state wavefunction (produced with the nscf run).
Therefore it should look like:
Best,
Samuel
EPW only supports k/q-point parallelization. EPW also needs/reads the ground-state wavefunction (produced with the nscf run).
Therefore it should look like:
Code: Select all
mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8 < scf.in > scf.out 2>&1 &
mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 64 < nscf.in > nscf.out 2>&1 &
mpiexec -np 64 ~/source/espresso-5.4.0/EPW/bin/epw.x -npool 64 < epw.in > epw.out 2>&1 &
Best,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Re: error in gan example
Dear Samuel,
Thanks for correcting. I am trying this out and will report soon.
Nandan.
Thanks for correcting. I am trying this out and will report soon.
Nandan.
Re: error in gan example
An update on this issue.
The GaN example works only when epw.x is run with single processor. I have tried
1. mpiexec -np 64 epw.x -npool 64 < epw.in > epw.out
2. mpiexec -np 8 epw.x -npool 8 < epw.in > epw.out
In both cases it crashes with the same error mentioned in my earlier emails.
The only one that works is:
epw.x < epw.in > epw.out
This completes successfully but is obviously slow. I cannot understand why the parallel jobs crash.
AlAs and GaAs parallel jobs complete successfully.
Could this be some compilation issue?
Nandan.
The GaN example works only when epw.x is run with single processor. I have tried
1. mpiexec -np 64 epw.x -npool 64 < epw.in > epw.out
2. mpiexec -np 8 epw.x -npool 8 < epw.in > epw.out
In both cases it crashes with the same error mentioned in my earlier emails.
The only one that works is:
epw.x < epw.in > epw.out
This completes successfully but is obviously slow. I cannot understand why the parallel jobs crash.
AlAs and GaAs parallel jobs complete successfully.
Could this be some compilation issue?
Nandan.
Re: error in gan example
Dear Nandan,
Everytime you change the number of cpu, you need to redo the nscf.in calculation.
So you would need to do
If you did that, then it might be a compilation issue. Which mpi lib are you using. Is it one of the tested one http://epw.org.uk/Main/TestFarm ?
Best,
Samuel
Everytime you change the number of cpu, you need to redo the nscf.in calculation.
So you would need to do
Code: Select all
mpiexec -np 64 pw.x -npool 64 < nscf.in > nscf.out
mpiexec -np 64 epw.x -npool 64 < epw.in > epw.out
If you did that, then it might be a compilation issue. Which mpi lib are you using. Is it one of the tested one http://epw.org.uk/Main/TestFarm ?
Best,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Re: error in gan example
I did run the nscf with the same number of processors and npools as epw.
I am going to recompile with newer compiler and flags from the TestFarm.
Let me get back to you on this.
Nandan.
I am going to recompile with newer compiler and flags from the TestFarm.
Let me get back to you on this.
Nandan.