EPW stops when DEALLOCATE mapg, and other gmap crashes
Posted: Tue Jun 02, 2020 11:32 pm
Hello,
I'm using the development version of EPW. I've encountered a few issues. I'm not sure if they're edge cases or what, since these issues don't appear for the test suite examples. I can post the scf and nscf input files if necessary, but I only print my epw.in file at the bottom of my post.
Crash type 1: If I run epw with with 1 node (36 processors) the calculation proceeds as normal until right after hitting the kmaps step. For the above run command, I get a crash on the line in rotate.f90. Here's the last part of my epw.out:
I made sure to use the same number of pools in my scf and nscf calculations as EPW.
Crash type 2: The error is slightly different if I don't use the -npools 18 modifier: running produces a different crash, where nothing gets written to stdout, and my epw.out looks like
where I assume it hasn't gotten as far. I believe it's crashing in something related to gmap as well though, as the epw.out printout seems to stop at
in elphon_shuffle_wrap.f90 (I have iverbosity == 1).
Crash type 3:
The above epw.in file has my nscf run at 8x8x8 k-points. I was planning on doing 14x14x14 or 16x16x16 for the coarse k-mesh. I originally tried this, with crashes in pw2wan902epw at calculating the MMN file, whereas pw2wannier90 in Quantum Espresso doesn't crash. I don't get any stdout from those crashes. Running etf_mem = 1 didn't help. I did this without the command, but it worked in pw2wannier90.x (though it took awhile). To achieve exponential localization I needed something like 90 bands in my disentanglement window, so it's possibly a memory issue that pw2wannier90.x handles better than EPW's pw2wan2epw.
Keep in mind however I did not have these problems with EPW using norm-conserving pseudopotentials for LaTe3, but unfortunately those really don't work for the other rare earth systems I'm studying.
Here is my epw.in:
I'm using the development version of EPW. I've encountered a few issues. I'm not sure if they're edge cases or what, since these issues don't appear for the test suite examples. I can post the scf and nscf input files if necessary, but I only print my epw.in file at the bottom of my post.
Crash type 1: If I run epw with
Code: Select all
mpirun -np $SLURM_NTASKS epw.x -npools 18 -i ./epw.in > epw.out
Code: Select all
DEALLOCATE(mapg, STAT = ierr)
Code: Select all
-------------------------------------------------------------------
WANNIER : 8549.48s CPU 8642.39s WALL ( 1 calls)
-------------------------------------------------------------------
Calculating kgmap
Progress kgmap: ########################################
kmaps : 0.14s CPU 0.26s WALL ( 1 calls)
Estimated size of gmap: ngxx = 7792
Symmetries of Bravais lattice: 4
Symmetries of crystal: 2
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 101560 RUNNING AT bdw-0192
= EXIT CODE: 134
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 101560 RUNNING AT bdw-0192
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================
Crash type 2: The error is slightly different if I don't use the -npools 18 modifier: running
Code: Select all
mpirun -np $SLURM_NTASKS epw.x -i ./epw.in > epw.out
Code: Select all
Calculating kgmap
Progress kgmap: ########################################
kmaps : 0.01s CPU 0.04s WALL ( 1 calls)
Estimated size of gmap: ngxx = 337
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 168548 RUNNING AT bdw-0533
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================
Code: Select all
CALL readgmap(nkstot)
Crash type 3:
The above epw.in file has my nscf run at 8x8x8 k-points. I was planning on doing 14x14x14 or 16x16x16 for the coarse k-mesh. I originally tried this, with crashes in pw2wan902epw at calculating the MMN file, whereas pw2wannier90 in Quantum Espresso doesn't crash. I don't get any stdout from those crashes. Running etf_mem = 1 didn't help. I did this without the
Code: Select all
-npools N
Keep in mind however I did not have these problems with EPW using norm-conserving pseudopotentials for LaTe3, but unfortunately those really don't work for the other rare earth systems I'm studying.
Here is my epw.in:
Code: Select all
--
&inputepw
prefix = 'DyTe3'
outdir = './'
iverbosity = 1
elph = .true.
kmaps = .false.
epbwrite = .true.
epbread = .false.
epwwrite = .true.
epwread = .false.
use_ws = .true.
etf_mem = 0 ! let's see if it works
!vme = .false.
!lifc = .false.
!asr_typ = 'simple'
wannierize = .false. ! this is for the electrons, which we do ourselves
num_iter = 500
iprint = 2
nbndsub = 8
!nbndskip = 28 ! now removed from EPW
bands_skipped = 'exclude_bands : 1-14'
dis_win_min = 0.0
dis_win_max = 38.0
dis_froz_min = 7.12
dis_froz_max = 7.22
wdata(1) = 'dis_num_iter = 10000'
wdata(2) = 'dis_mix_ratio = 0.90'
wdata(3) = 'dis_conv_tol = 1E-7'
wdata(4) = 'translate_home_cell = .true.'
wdata(5) = 'write_xyz = .true.'
wdata(6) = 'write_hr = .true.'
wdata(7) = 'hr_cutoff = 0.005'
wdata(8) = 'write_rmn = .true.'
wdata(9) = 'use_ws_distance = .true.'
wdata(10) = 'num_print_cycles = 50'
wdata(11) = 'kmesh_tol = 0.0000001'
wdata(12) = 'fermi_surface_plot = .true.'
wdata(13) = 'fermi_surface_num_points = 41'
wdata(14) = 'fermi_energy = 7.1340'
proj(1) = 'f=0.9192953735,0.0807145153,0.7500000000:px:z=0.70710678,0,0.70710678:x=0.70710678,0,-0.70710678'
proj(2) = 'f=0.0807046265,0.9192854847,0.2500000000:px:z=0.70710678,0,0.70710678:x=0.70710678,0,-0.70710678'
proj(3) = 'f=0.5807034106,0.4192860631,0.7500000000:px:z=0.70710678,0,0.70710678:x=0.70710678,0,-0.70710678'
proj(4) = 'f=0.4192965894,0.5807139369,0.2500000000:px:z=0.70710678,0,0.70710678:x=0.70710678,0,-0.70710678'
proj(5) = 'f=0.9192953735,0.0807145153,0.7500000000:pz:z=0.70710678,0,0.70710678:x=0.70710678,0,-0.70710678'
proj(6) = 'f=0.0807046265,0.9192854847,0.2500000000:pz:z=0.70710678,0,0.70710678:x=0.70710678,0,-0.70710678'
proj(7) = 'f=0.5807034106,0.4192860631,0.7500000000:pz:z=0.70710678,0,0.70710678:x=0.70710678,0,-0.70710678'
proj(8) = 'f=0.4192965894,0.5807139369,0.2500000000:pz:z=0.70710678,0,0.70710678:x=0.70710678,0,-0.70710678'
elecselfen = .true.
phonselfen = .false.
a2f = .false.
prtgkk = .false.
nest_fn = .true. ! will give me the nesting function at least
assume_metal = .true.
!fsthick = 0.7 ! eV
eptemp = 293 ! K
degaussw = 0.02 ! eV
dvscf_dir = '/lcrc/project/nickelates/adam-adenchfi/DyTe3_primitive/phon/save'
!filkf = './LGX.txt'
!filqf = './LGX.txt'
nk1 = 8
nk2 = 8
nk3 = 8
nkf1 = 8
nkf2 = 8
nkf3 = 8
nq1 = 2
nq2 = 2
nq3 = 2
nqf1 = 2
nqf2 = 2
nqf3 = 2
/
8 cartesian
0.000000000 0.000000000 0.000000000
0.000000000 0.000000000 -1.554493425
0.000000000 -1.571586348 0.000000000
0.000000000 -1.571586348 -1.554493425
-0.500000000 -1.489927397 0.000000000
-0.500000000 -1.489927397 -1.554493425
-0.500000000 -3.061513745 0.000000000
-0.500000000 -3.061513745 -1.554493425