Invalid memory reference -- double free or corruption (out) while doing MgB2 1x1x2 supercell calculation
Posted: Mon Nov 13, 2023 1:05 am
Hi,
I am trying to run an anisotropic calculation on MgB2 1x1x2 supercell. The unit cell calculation runs fine but the 1x1x2 supercell calculation gives the following error:
When I ran the same supercell calculation earlier, I got the following error message with the same Backtrace for the error:
As this error is related to memory issue I used valgrind to debug as below,
but I still do not understand what is giving this error.
Below are the input and output files for reference. At this point I am not sure what is causing the error. Could a selection of wrong wannier projections give this issue? Or k-grid selection?
Phonon calculation was done on 6x6x3 q-grid.
epw.in:
epw.out stops here:
MgB2_1x1x2.wout stops here:
scf.in and nscf.in for reference:
I am trying to run an anisotropic calculation on MgB2 1x1x2 supercell. The unit cell calculation runs fine but the 1x1x2 supercell calculation gives the following error:
Code: Select all
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7ff8e9680dbf in ???
#1 0x7ff8e96d3cb3 in ???
#2 0x54f91b in __pw2wan2epw_MOD_compute_amn_para
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/pw2wan2epw.f90:1142
#3 0x556964 in __pw2wan2epw_MOD_pw2wan90epw
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/pw2wan2epw.f90:103
#4 0x4227d9 in __wannierization_MOD_wann_run
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/wannierization.f90:73
#5 0x40964d in epw
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/epw.f90:133
#6 0x40945c in main
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/epw.f90:20
Code: Select all
double free or corruption (out)
Program received signal SIGABRT: Process abort signal.
Code: Select all
valgrind --leak-check=full srun epw.x $flags1 -nk 128 -input epw.in > epw.out 2> epw.err
Code: Select all
==462528== Memcheck, a memory error detector
==462528== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==462528== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==462528== Command: srun epw.x -nimage 1 -npool 16 -nband 1 -ntg 1 -ndiag 1 -nk 128 -input epw.in
==462528==
munmap_chunk(): invalid pointer
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7f85954a2dbf in ???
#1 0x7f85954a2d2b in ???
#2 0x7f85954a43e4 in ???
#3 0x7f85954e8c26 in ???
#4 0x7f85954f0cc9 in ???
#5 0x7f85954f0f9b in ???
#6 0x54f91b in __pw2wan2epw_MOD_compute_amn_para
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/pw2wan2epw.f90:1142
#7 0x556964 in __pw2wan2epw_MOD_pw2wan90epw
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/pw2wan2epw.f90:103
#8 0x4227d9 in __wannierization_MOD_wann_run
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/wannierization.f90:73
#9 0x40964d in epw
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/epw.f90:133
#10 0x40945c in main
at /global/common/software/nersc/pm-2021q4/sw/qe/pm-cpu/qe-7.0/EPW/src/epw.f90:20
srun: error: nid006696: task 107: Aborted
srun: Terminating StepId=18135667.2
slurmstepd: error: *** STEP 18135667.2 ON nid004583 CANCELLED AT 2023-11-12T23:26:14 ***
srun: error: nid005797: tasks 48-63: Terminated
srun: error: nid004773: tasks 16-31: Terminated
srun: error: nid004583: tasks 0-15: Terminated
srun: error: nid006134: tasks 80-95: Terminated
srun: error: nid005672: tasks 32-47: Terminated
srun: error: nid005866: tasks 64-79: Terminated
srun: error: nid006696: tasks 96-106,108-111: Terminated
srun: error: nid006779: tasks 112-127: Terminated
srun: Force Terminated StepId=18135667.2
==462529==
==462529== HEAP SUMMARY:
==462529== in use at exit: 684,400 bytes in 4,903 blocks
==462529== total heap usage: 23,096 allocs, 18,193 frees, 12,955,997 bytes allocated
==462529==
==462529== 0 bytes in 1 blocks are possibly lost in loss record 9 of 1,191
==462529== at 0x4A366A4: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==462529== by 0x531959F: register_state (in /lib64/libc-2.31.so)
==462529== by 0x531BE86: re_acquire_state_context (in /lib64/libc-2.31.so)
==462529== by 0x5322CCB: re_compile_internal (in /lib64/libc-2.31.so)
==462529== by 0x5328298: regcomp (in /lib64/libc-2.31.so)
==462529== by 0x4F42B57: s_p_hashtbl_create_cnt (parse_config.c:209)
==462529== by 0x4F4299B: s_p_hashtbl_create (parse_config.c:217)
==462529== by 0x4F54335: _init_slurm_conf (read_config.c:3215)
==462529== by 0x4F5A4F7: slurm_conf_init_load (read_config.c:3508)
==462529== by 0x4F5A859: slurm_conf_init (read_config.c:3533)
==462529== by 0x4EE6141: slurm_init (init.c:47)
==462529== by 0x411614: srun (srun.c:176)
==462529==
==462529== 0 bytes in 1 blocks are possibly lost in loss record 10 of 1,191
==462529== at 0x4A366A4: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==462529== by 0x531959F: register_state (in /lib64/libc-2.31.so)
==462529== by 0x531BE86: re_acquire_state_context (in /lib64/libc-2.31.so)
==462529== by 0x5322E12: re_compile_internal (in /lib64/libc-2.31.so)
==462529== by 0x5328298: regcomp (in /lib64/libc-2.31.so)
==462529== by 0x4F42B57: s_p_hashtbl_create_cnt (parse_config.c:209)
==462529== by 0x4F4299B: s_p_hashtbl_create (parse_config.c:217)
==462529== by 0x4F54335: _init_slurm_conf (read_config.c:3215)
==462529== by 0x4F5A4F7: slurm_conf_init_load (read_config.c:3508)
==462529== by 0x4F5A859: slurm_conf_init (read_config.c:3533)
==462529== by 0x4EE6141: slurm_init (init.c:47)
==462529== by 0x411614: srun (srun.c:176)
==462529==
==462529== 4 bytes in 1 blocks are possibly lost in loss record 19 of 1,191
==462529== at 0x4A366A4: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==462529== by 0x5319466: re_node_set_insert (in /lib64/libc-2.31.so)
==462529== by 0x531A78D: duplicate_node_closure (in /lib64/libc-2.31.so)
==462529== by 0x531B847: calc_eclosure_iter (in /lib64/libc-2.31.so)
==462529== by 0x5322A1C: re_compile_internal (in /lib64/libc-2.31.so)
==462529== by 0x5328298: regcomp (in /lib64/libc-2.31.so)
==462529== by 0x4F42B57: s_p_hashtbl_create_cnt (parse_config.c:209)
==462529== by 0x4F4299B: s_p_hashtbl_create (parse_config.c:217)
==462529== by 0x4F54335: _init_slurm_conf (read_config.c:3215)
==462529== by 0x4F5A4F7: slurm_conf_init_load (read_config.c:3508)
==462529== by 0x4F5A859: slurm_conf_init (read_config.c:3533)
==462529== by 0x4EE6141: slurm_init (init.c:47)
... (and so on...)
Below are the input and output files for reference. At this point I am not sure what is causing the error. Could a selection of wrong wannier projections give this issue? Or k-grid selection?
Phonon calculation was done on 6x6x3 q-grid.
epw.in:
Code: Select all
--
&inputepw
prefix = 'MgB2_1x1x2',
amass(1) = 24.305,
amass(2) = 10.811
outdir = './'
ep_coupling = .true.
elph = .true.
epbwrite = .true.
epbread = .false.
epwwrite = .true.
epwread = .false.
etf_mem = 1
nbndsub = 5,
wannierize = .true.
num_iter = 500
dis_froz_max= 8.8
proj(1) = 'B:pz'
proj(2) = 'f=0.5,1.0,0.25:s'
proj(3) = 'f=0.0,0.5,0.25:s'
proj(4) = 'f=0.5,0.5,0.25:s'
proj(5) = 'f=0.5,1.0,0.75:s'
proj(6) = 'f=0.0,0.5,0.75:s'
proj(7) = 'f=0.5,0.5,0.75:s'
iverbosity = 2
eps_acustic = 2.0 ! Lowest boundary for the phonon frequency
ephwrite = .true. ! Writes .ephmat files used when Eliasberg = .true.
fsthick = 0.4 ! eV
degaussw = 0.10 ! eV
nsmear = 1
delta_smear = 0.04 ! eV
degaussq = 0.5 ! meV
nqstep = 500
eliashberg = .true.
laniso = .true.
limag = .true.
lpade = .true.
conv_thr_iaxis = 1.0d-4
wscut = 1.0 ! eV Upper limit over frequency integration/summation in the Elisashberg eq
nstemp = 1 ! Nr. of temps
temps = 15.00 ! K provide list of temperetures OR (nstemp and temps = tempsmin tempsmax for even space mode)
nsiter = 500
muc = 0.16
dvscf_dir = '../phonons/save'
nk1 = 6
nk2 = 6
nk3 = 3
nq1 = 6
nq2 = 6
nq3 = 3
mp_mesh_k = .true.
nkf1 = 20
nkf2 = 20
nkf3 = 20
nqf1 = 20
nqf2 = 20
nqf3 = 20
/
Code: Select all
-------------------------------------------------------------------
Wannierization on 6 x 6 x 3 electronic grid
-------------------------------------------------------------------
Spin CASE ( default = unpolarized )
Initializing Wannier90
Initial Wannier projections
( 0.33333 0.66667 0.25000) : l = 1 mr = 1
( 0.66667 0.33333 0.75000) : l = 1 mr = 1
( 0.66667 0.33333 0.25000) : l = 1 mr = 1
( 0.33333 0.66667 0.75000) : l = 1 mr = 1
( 0.50000 1.00000 0.25000) : l = 0 mr = 1
- Number of bands is ( 12)
- Number of total bands is ( 12)
- Number of excluded bands is ( 0)
- Number of wannier functions is ( 5)
- All guiding functions are given
Reading data about k-point neighbours
- All neighbours are found
AMN
k points = 108 in 128 pools
1 of 1 on ionode
Code: Select all
Time to write kmesh 0.526 (sec)
MgB2_1x1x2.nnkp written.
Time to write kmesh 0.526 (sec)
Finished setting up k-point neighbours.
Exiting wannier_setup in wannier90 15:26:05
Code: Select all
&control
calculation='scf',
prefix='MgB2_1x1x2',
pseudo_dir = '../pp/',
outdir='./',
tprnfor = .true.,
tstress = .true.,
etot_conv_thr = 1.0d-5
forc_conv_thr = 1.0d-4
/
&system
ibrav = 0,
nat= 6,
ntyp = 2,
ecutwfc = 40
smearing = 'mp'
occupations = 'smearing'
degauss = 0.02
/
&electrons
diagonalization = 'david'
mixing_mode = 'plain'
mixing_beta = 0.7
conv_thr = 1.0d-9
/
ATOMIC_SPECIES
Mg 24.305 Mg.pz-n-vbc.UPF
B 10.811 B.pz-vbc.UPF
ATOMIC_POSITIONS crystal
Mg 0.0000000000 0.0000000000 0.0000000000
Mg 0.0000000000 0.0000000000 0.5000000000
B 0.3333333400 0.6666666800 0.2500000000
B 0.6666666090 0.3333332950 0.7500000000
B 0.6666666090 0.3333332950 0.2500000000
B 0.3333333400 0.6666666800 0.7500000000
K_POINTS AUTOMATIC
12 12 6 0 0 0
CELL_PARAMETERS angstrom
3.0829999447 0.0000000000 0.0000000000
-1.5414999723 2.6699562720 0.0000000000
0.0000000000 0.0000000000 7.0419998169
Code: Select all
&control
calculation='nscf',
prefix='MgB2_1x1x2',
pseudo_dir = '../pp/',
outdir='./',
tprnfor = .true.,
tstress = .true.,
etot_conv_thr = 1.0d-5
forc_conv_thr = 1.0d-4
/
&system
ibrav = 0,
nat= 6,
ntyp = 2,
ecutwfc = 40
smearing = 'mp'
occupations = 'smearing'
degauss = 0.02
/
&electrons
diagonalization = 'david'
mixing_mode = 'plain'
mixing_beta = 0.7
conv_thr = 1.0d-9
/
ATOMIC_SPECIES
Mg 24.305 Mg.pz-n-vbc.UPF
B 10.811 B.pz-vbc.UPF
ATOMIC_POSITIONS crystal
Mg 0.0000000000 0.0000000000 0.0000000000
Mg 0.0000000000 0.0000000000 0.5000000000
B 0.3333333400 0.6666666800 0.2500000000
B 0.6666666090 0.3333332950 0.7500000000
B 0.6666666090 0.3333332950 0.2500000000
B 0.3333333400 0.6666666800 0.7500000000
CELL_PARAMETERS angstrom
3.0829999447 0.0000000000 0.0000000000
-1.5414999723 2.6699562720 0.0000000000
0.0000000000 0.0000000000 7.0419998169
K_POINTS crystal
108
0.0 0.0 0.0 0.00925925925926
0.0 0.0 0.333333333333 0.00925925925926
0.0 0.0 0.666666666667 0.00925925925926
0.0 0.166666666667 0.0 0.00925925925926
0.0 0.166666666667 0.333333333333 0.00925925925926
0.0 0.166666666667 0.666666666667 0.00925925925926
0.0 0.333333333333 0.0 0.00925925925926
0.0 0.333333333333 0.333333333333 0.00925925925926
0.0 0.333333333333 0.666666666667 0.00925925925926
0.0 0.5 0.0 0.00925925925926
0.0 0.5 0.333333333333 0.00925925925926
0.0 0.5 0.666666666667 0.00925925925926
0.0 0.666666666667 0.0 0.00925925925926
0.0 0.666666666667 0.333333333333 0.00925925925926
0.0 0.666666666667 0.666666666667 0.00925925925926
0.0 0.833333333333 0.0 0.00925925925926
0.0 0.833333333333 0.333333333333 0.00925925925926
0.0 0.833333333333 0.666666666667 0.00925925925926
0.166666666667 0.0 0.0 0.00925925925926
0.166666666667 0.0 0.333333333333 0.00925925925926
0.166666666667 0.0 0.666666666667 0.00925925925926
0.166666666667 0.166666666667 0.0 0.00925925925926
0.166666666667 0.166666666667 0.333333333333 0.00925925925926
0.166666666667 0.166666666667 0.666666666667 0.00925925925926
0.166666666667 0.333333333333 0.0 0.00925925925926
0.166666666667 0.333333333333 0.333333333333 0.00925925925926
0.166666666667 0.333333333333 0.666666666667 0.00925925925926
0.166666666667 0.5 0.0 0.00925925925926
0.166666666667 0.5 0.333333333333 0.00925925925926
0.166666666667 0.5 0.666666666667 0.00925925925926
0.166666666667 0.666666666667 0.0 0.00925925925926
0.166666666667 0.666666666667 0.333333333333 0.00925925925926
0.166666666667 0.666666666667 0.666666666667 0.00925925925926
0.166666666667 0.833333333333 0.0 0.00925925925926
0.166666666667 0.833333333333 0.333333333333 0.00925925925926
... (and so on...)