error in gan example

Post here questions linked with issue while running the EPW code

Moderator: stiwari

Post Reply
Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

error in gan example

Post by Nandan »

Hi,

The GaN example crashes with the following error:
==========================================================================
Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Error in routine davcio (115):
error while writing from file "./tmp/gan.epmatwp1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
==========================================================================

Meanwhile the SiC example runs to completion and I did test a GaAs which also completes without error.

Do you have any suggestions for the GaN example. I am running it as is.

Thanks.

Nandan.

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: error in gan example

Post by sponce »

Dear Nandan,

It look like the error has to do with not being able to write on disk.

Maybe a disk quota exceeded?

Maybe try not to write in a /tmp directory.

Sam
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: error in gan example

Post by Nandan »

Dear Samuel,

I suspected that its because of the /tmp directory but that is not the case. If I do not write to /tmp:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from davcio : error # 115
error while writing from file "./gan.epmatwp1"
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Certainly not disk quota issue. Any other thing that I should check?

An independent test of the gan example on a different machine also does not finish. Although there is no error message. It ends with:
===============================================================
q( 216 ) = ( -0.3333333 0.5773503 -0.3067271 )
BMN calculated


Writing epmatq on .epb files

rank 0 in job 11 east_53774 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
===============================================================

Regards,

Nandan.

Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: error in gan example

Post by Nandan »

Additional execution information:

Code: Select all

mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8  < scf.in > scf.out 2>&1 &

mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8  < nscf.in > nscf.out 2>&1 &

mpiexec -np 64 ~/source/espresso-5.4.0/EPW/bin/epw.x -npool 64  < epw.in > epw.out 2>&1 &


Which I believe is a consistent set of processors/pools.

Nandan.

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: error in gan example

Post by sponce »

Dear Nandan,

EPW only supports k/q-point parallelization. EPW also needs/reads the ground-state wavefunction (produced with the nscf run).

Therefore it should look like:

Code: Select all

mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 8  < scf.in > scf.out 2>&1 &

mpiexec -np 64 ~/source/espresso-5.4.0/bin/pw.x -npool 64  < nscf.in > nscf.out 2>&1 &

mpiexec -np 64 ~/source/espresso-5.4.0/EPW/bin/epw.x -npool 64  < epw.in > epw.out 2>&1 &


Best,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: error in gan example

Post by Nandan »

Dear Samuel,

Thanks for correcting. I am trying this out and will report soon.

Nandan.

Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: error in gan example

Post by Nandan »

An update on this issue.

The GaN example works only when epw.x is run with single processor. I have tried
1. mpiexec -np 64 epw.x -npool 64 < epw.in > epw.out
2. mpiexec -np 8 epw.x -npool 8 < epw.in > epw.out

In both cases it crashes with the same error mentioned in my earlier emails.


The only one that works is:
epw.x < epw.in > epw.out

This completes successfully but is obviously slow. I cannot understand why the parallel jobs crash.

AlAs and GaAs parallel jobs complete successfully.

Could this be some compilation issue?

Nandan.

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: error in gan example

Post by sponce »

Dear Nandan,

Everytime you change the number of cpu, you need to redo the nscf.in calculation.

So you would need to do

Code: Select all

mpiexec -np 64 pw.x -npool 64 < nscf.in > nscf.out
mpiexec -np 64 epw.x -npool 64 < epw.in > epw.out


If you did that, then it might be a compilation issue. Which mpi lib are you using. Is it one of the tested one http://epw.org.uk/Main/TestFarm ?

Best,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: error in gan example

Post by Nandan »

I did run the nscf with the same number of processors and npools as epw.

I am going to recompile with newer compiler and flags from the TestFarm.

Let me get back to you on this.

Nandan.

Post Reply