Unknown errors, may be related with MPIRUN???

Post here questions linked with issue while running the EPW code

Moderator: stiwari

Post Reply
hanwooh

Unknown errors, may be related with MPIRUN???

Post by hanwooh »

Dear all,

I tried to run EPW program for my systems, but unknown error occurred.

EPW output containing error terms is described below.

MMN
k points = 216 in 24 pools
1 of 9 on ionode
2 of 9 on ionode
3 of 9 on ionode
4 of 9 on ionode
5 of 9 on ionode
6 of 9 on ionode
7 of 9 on ionode
8 of 9 on ionode
9 of 9 on ionode
MMN calculated

Running Wannier90

Wannier Function centers (cartesian, alat) and spreads (ang):

( 1.62403 0.00324 0.25007) : 7.96177
( 2.28989 0.02367 0.33674) : 13.87144
( 1.81857 0.26323 0.40333) : 1.73350
( 1.27573 0.03271 0.32375) : 24.88504
( 0.98722 0.22850 0.00802) : 1.62034
( 1.20645 0.00112 0.17524) : 9.50708
( 1.31679 -0.18519 -0.03276) : 13.24065
( 1.87340 0.00553 0.02981) : 2.00881
( 0.99984 -0.22582 -0.00608) : 1.64715

-------------------------------------------------------------------
WANNIER : 29.28s CPU 30.18s WALL ( 1 calls)
-------------------------------------------------------------------
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[22804,1],0]
Exit code: 24
--------------------------------------------------------------------------

The output described above is last response from EPW program.

Since I checked Wannierization for this systems independently, I guess this error comes from mpirun.

Furthermore, I succeeded in running examples (diamond, sic, gan ...) without the error.

Is there anyone who encounters same problem or knows how to solve this problem?

Best regards,

Woohyun Han

carla.verdi
Posts: 155
Joined: Thu Jan 14, 2016 10:52 am
Affiliation:

Re: Unknown errors, may be related with MPIRUN???

Post by carla.verdi »

Dear Woohyun,

Have you checked if the Wannier function centers are the same as the ones you got in your independent wannier run?

Best
Carla

hanwooh

Re: Unknown errors, may be related with MPIRUN???

Post by hanwooh »

Dear Carla,

Thank you for reply.

As you said, I compared Wannier function centers from EPW with those in independent wannier90 calculations.

I think there are some differences between those centers as described below.

Final State
WF centre and spread 1 ( 6.378049, 1.497625, 0.313624 ) 1.55974913
WF centre and spread 2 ( 7.744868, 0.081913, 1.568056 ) 6.76808223
WF centre and spread 3 ( 5.817133, -0.176038, 1.671249 ) 13.60156150
WF centre and spread 4 ( 6.220397, -1.507516, 0.121997 ) 1.88320599
WF centre and spread 5 ( 6.752774, -1.693450, 0.007622 ) 1.62468714
WF centre and spread 6 ( 8.635778, -1.342604, 1.118327 ) 9.97006833
WF centre and spread 7 ( 10.832575, 0.283065, 1.784440 ) 5.61086126
WF centre and spread 8 ( 9.582330, -0.996237, 3.016617 ) 8.80800453
WF centre and spread 9 ( 5.810722, -0.127154, 2.674300 ) 1.36056578
Sum of centres and spreads ( 67.774627, -3.980397, 12.276231 ) 51.18678588

Final State
WF centre and spread 1 ( 11.151743, -1.210038, 0.319927 ) 11.15017833
WF centre and spread 2 ( 10.894665, -0.285612, 1.869398 ) 4.64760681
WF centre and spread 3 ( 12.183455, -1.746042, 2.620736 ) 1.23923842
WF centre and spread 4 ( 3.043918, -0.137469, 0.680681 ) 7.22229943
WF centre and spread 5 ( 5.689015, 0.283201, 1.098739 ) 13.53229528
WF centre and spread 6 ( 6.345690, -1.738214, 0.043451 ) 2.09678070
WF centre and spread 7 ( 6.665323, -1.612465, 0.130383 ) 1.72147046
WF centre and spread 8 ( 12.851847, -0.457410, 0.192081 ) 1.54709125
WF centre and spread 9 ( 8.010014, -0.040933, 1.303419 ) 5.81579091
Sum of centres and spreads ( 76.835670, -6.944982, 8.258814 ) 48.97275160

Does the difference causes serious errors?

And I have one more question about the spread of wannier functions.

In my calculations, I think spreads of wannier functions seem to be larger than those in example files.

Is it okay to use those wannier functions?

Best regards,

Woohyun Han

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: Unknown errors, may be related with MPIRUN???

Post by sponce »

Dear Woohyun Han,

Your Wannier spread seems quite big indeed (i.e. not very localized).

Could you show us an image of the KS bandstructure you get with Wannier compare to the one you get with a normal nscf run?

That might help determine whether your wannierization is correct.

Best,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

hanwooh

Re: Unknown errors, may be related with MPIRUN???

Post by hanwooh »

Dear Samuel,

Thank you for reply.

I changed projection tags, so that I got decreased spread of Wannier functions.

My final spread of Wannier functions and comparison between KS bandstucture and band obtained from WF are described below.

Wannier Function centers (cartesian, alat) and spreads (ang):

( 1.03732 0.00813 -0.15049) : 9.91240
( 1.22711 -0.00222 0.52194) : 9.12256
( 0.00633 -0.00169 0.01217) : 0.77883
( 0.01338 0.00268 -0.01891) : 2.17913
( -0.02199 0.00018 -0.00404) : 2.99396
( 0.00732 0.00259 0.00501) : 1.91158
( 1.43156 -0.00590 0.23216) : 7.71475


Image


I think the bandstructure is well reproduced by WFs, but same error occurred during running epw program.

-------------------------------------------------------------------
WANNIER : 29.21s CPU 31.63s WALL ( 1 calls)
-------------------------------------------------------------------
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[34846,1],0]
Exit code: 24
--------------------------------------------------------------------------

Is there any possibility that this error comes from memory or wrong compilation ???

Best regards,

Woohyun Han

carla.verdi
Posts: 155
Joined: Thu Jan 14, 2016 10:52 am
Affiliation:

Re: Unknown errors, may be related with MPIRUN???

Post by carla.verdi »

Dear Woohyun

Glad to hear the wannierization seems to be correct. However, it's hard to spot what the error could be without seeing the input - if you could share it here or in a message we will hopefully be of more help.

Best
Carla

hanwooh

Re: Unknown errors, may be related with MPIRUN???

Post by hanwooh »

Dear Carla,

Thank you for reply.

I will share my epw.in input file here and send more detail input files (scf.in, ph.in ..) to you via a message.

Code: Select all

--
&inputepw

  iverbosity  = 0

  elph        = .true.
  epbwrite    = .true.
  epbread     = .false.

  epwwrite    = .true.
  epwread     = .false.

  nbndsub     =  7
  nbndskip    =  0

  wannierize  = .true.
  num_iter    = 300
  iprint      = 2
  dis_froz_max= 3.6

  proj(1)     = 'f=0.0416493,   0.85381741,  0.11121882:pz'
  proj(2)     = 'f=0.34806258,  0.39248434,  0.51198036:pz'
  proj(3)     = 'f=0.0,0.0,0.0:s'
  proj(4)     = 'f=0.0,0.0,0.0:p'
  proj(5)     = 'f=0.5,0.5,0.5:s'

  elinterp    = .false.
  phinterp    = .true.

  tshuffle2   = .true.
  tphases     = .false.

  elecselfen  = .true.
  phonselfen  = .true.
  a2f         = .false.

  parallel_k  = .true.
  parallel_q  = .false.

  fsthick     = 1.36056981 ! eV
  eptemp      = 300 ! K (same as PRB 76, 165108)
  degaussw    = 0.1 ! eV

  dvscf_dir   = '../phonons/save'
  nkf1        = 20
  nkf2        = 20
  nkf3        = 20

  nqf1        = 20
  nqf2        = 20
  nqf3        = 20

  nk1         = 6
  nk2         = 6
  nk3         = 6

  nq1         = 4
  nq2         = 4
  nq3         = 4
 /
      24 cartesian
   0.000000000000000E+00   0.000000000000000E+00   0.000000000000000E+00
   0.303730707609660E-02   0.000000000000000E+00   0.528049018077528E+00
  -0.607461415219320E-02   0.000000000000000E+00  -0.105609803615506E+01
   0.128701948833550E+00   0.450984496816358E+00  -0.242934511855876E+00
   0.131739255909647E+00   0.450984496816358E+00   0.285114506221652E+00
   0.122627334681357E+00   0.450984496816358E+00  -0.129903254801093E+01
   0.125664641757454E+00   0.450984496816358E+00  -0.770983529933404E+00
  -0.257403897667100E+00  -0.901968993632717E+00   0.485869023711752E+00
  -0.254366590591004E+00  -0.901968993632717E+00   0.101391804178928E+01
  -0.263478511819294E+00  -0.901968993632717E+00  -0.570229012443304E+00
   0.257403897667100E+00   0.000000000000000E+00  -0.485869023711752E+00
   0.260441204743197E+00   0.000000000000000E+00   0.421799943657760E-01
   0.251329283514907E+00   0.000000000000000E+00  -0.154196705986681E+01
   0.254366590591004E+00   0.000000000000000E+00  -0.101391804178928E+01
  -0.128701948833550E+00  -0.135295349044908E+01   0.242934511855876E+00
  -0.125664641757454E+00  -0.135295349044908E+01   0.770983529933404E+00
  -0.134776562985743E+00  -0.135295349044908E+01  -0.813163524299180E+00
  -0.131739255909647E+00  -0.135295349044908E+01  -0.285114506221652E+00
   0.000000000000000E+00  -0.901968993632717E+00   0.000000000000000E+00
   0.303730707609660E-02  -0.901968993632717E+00   0.528049018077528E+00
  -0.607461415219320E-02  -0.901968993632717E+00  -0.105609803615506E+01
  -0.514807795334201E+00   0.000000000000000E+00   0.971738047423504E+00
  -0.511770488258104E+00   0.000000000000000E+00   0.149978706550103E+01
  -0.520882409486394E+00   0.000000000000000E+00  -0.843599887315519E-01


Best regards,

Woohyun Han

carla.verdi
Posts: 155
Joined: Thu Jan 14, 2016 10:52 am
Affiliation:

Re: Unknown errors, may be related with MPIRUN???

Post by carla.verdi »

Dear Woohyun

The list of 24 q points at the end should contain a 4th column with the weights. I think this is causing the crash.

We'll try and add a clear error message if that's the case.

Best
Carla

hanwooh

Re: Unknown errors, may be related with MPIRUN???

Post by hanwooh »

Dear Carla,

Thank you for your critical advise.

As you said, I had mistakes when I added q-point lists without their weights.

When I fixed this q-point weights and changed nk and nq values for commensuration, epw program runs well without any errors.

Thank you for helping me, again.

Best regards,

Woohyun Han

Post Reply