slurmstepd: error: Detected 2 oom-kill event(s)

Post here questions linked with issue while running the EPW code

Moderator: stiwari

Post Reply
lixuejie
Posts: 2
Joined: Wed Dec 06, 2023 9:04 am
Affiliation: xjtu

slurmstepd: error: Detected 2 oom-kill event(s)

Post by lixuejie »

Dear epw users

I am using EPW7.1 to calculate superconducting temperture of YB2 compound. I am getting this error all the time:

"= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 508807 RUNNING AT cn14
= KILLED BY SIGNAL: 9 (Killed)"

the error file:
"slurmstepd: error: Detected 2 oom-kill event(s) in StepId=19809.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: cn14: task 0: Out Of Memory"

along with my input files for scf nscf and epw:

&control
calculation='scf'
prefix='YB2'
etot_conv_thr = 1.0d-5
forc_conv_thr = 1.0D-4, !Default: 1.0D-3 (a.u)
pseudo_dir='./'
outdir='./'
tprnfor = .true.,
tstress = .true.,
/

&system
ibrav=4,
celldm(1) = 6.224896721,
celldm(3) = 1.171172955,
nat=3,
ntyp=2,
ecutwfc=60,
smearing = 'mp'
occupations = 'smearing'
degauss = 0.02

/

&electrons
diagonalization = 'david'
mixing_mode = 'plain'
conv_thr=1.0d-9,
mixing_beta = 0.7,
/

ATOMIC_SPECIES
Y 88.906 Y_ONCV_PBE-1.0.upf
B 10.811 B_ONCV_PBE-1.0.upf

ATOMIC_POSITIONS crystal
Y 0.000000 0.000000 0.000000
B 0.666667 0.333333 0.500000
B 0.333333 0.666667 0.500000

K_POINTS automatic
12 12 12 0 0 0




&control
calculation='bands',
prefix='YB2',
pseudo_dir = './',
outdir='./',
tprnfor = .true.,
tstress = .true.,
etot_conv_thr = 1.0d-5
forc_conv_thr = 1.0d-4
/
&system
ibrav=4,
celldm(1) = 6.224896721,
celldm(3) = 1.171172955,
nat= 3,
ntyp = 2,
ecutwfc = 60
smearing = 'mp'
occupations = 'smearing'
degauss = 0.02
nbnd = 35
/
&electrons
diagonalization = 'david'
mixing_mode = 'plain'
mixing_beta = 0.7
conv_thr = 1.0d-9
/

ATOMIC_SPECIES
Y 88.906 Y_ONCV_PBE-1.0.upf
B 10.811 B_ONCV_PBE-1.0.upf

ATOMIC_POSITIONS crystal
Y 0.000000 0.000000 0.000000
B 0.666667 0.333333 0.500000
B 0.333333 0.666667 0.500000

K_POINTS crystal
1728
0.00000000 0.00000000 0.00000000 5.787037e-04
0.00000000 0.00000000 0.08333333 5.787037e-04
0.00000000 0.00000000 0.16666667 5.787037e-04
...

--
&inputepw
prefix = 'YB2',
amass(1) = 88.906,
amass(2) = 10.811
outdir = './'
max_memlt = 50

ep_coupling = .true.
elph = .true.
epbwrite = .true.
epbread = .false.

epwwrite = .true.
epwread = .false.
vme = .false.

etf_mem = 1

nbndsub = 7,

efermi_read = .true.
fermi_energy = 12.085876

wannierize = .true.
num_iter = 200

!dis_win_max = 14.8
!dis_win_min = 7.12
dis_froz_max = 12.32
dis_froz_min= 11.7

proj(1) = 'Y:dz2,dxy,dx2-y2'
proj(2) = 'B:pz,py'

wdata(1) = 'guiding_centres = .true.'
wdata(2) = 'dis_num_iter = 500'
wdata(3) = 'bands_plot = .true.'
wdata(4) = 'begin kpoint_path'
wdata(5) = 'G 0.0000000 0.0000000 0.0000000 K 0.3333333 0.3333333 0.0000000'
wdata(6) = 'K 0.3333333 0.3333333 0.0000000 M 0.5000000 0.0000000 0.0000000'
wdata(7) = 'M 0.5000000 0.0000000 0.0000000 G 0.0000000 0.0000000 0.0000000'
wdata(8) = 'G 0.0000000 0.0000000 0.0000000 A 0.0000000 0.0000000 0.5000000'
wdata(9) = 'A 0.0000000 0.0000000 0.5000000 H 0.3333333 0.3333333 0.5000000'
wdata(10) ='H 0.3333333 0.3333333 0.5000000 L 0.5000000 0.0000000 0.5000000'
wdata(11) ='L 0.5000000 0.0000000 0.5000000 A 0.0000000 0.0000000 0.5000000'
wdata(12) = 'end kpoint_path'
wdata(13) = 'bands_plot_format = gnuplot'
wdata(14)= 'use_ws_distance = T'
wdata(15)= 'conv_window = 4'
wdata(16) = 'kmesh_tol=0.00001'

iverbosity = 2

eps_acustic = 2.0 ! Lowest boundary for the phonon frequency
ephwrite = .true. ! Writes .ephmat files used when Eliasberg = .true.

fsthick = 0.4 ! eV
degaussw = 0.10 ! eV
nsmear = 1
delta_smear = 0.04 ! eV

degaussq = 0.5 ! meV
nqstep = 500

eliashberg = .true.

laniso = .true.
limag = .true.
lpade = .true.

conv_thr_iaxis = 1.0d-4

wscut = 0.42 ! eV Upper limit over frequency integration/summation in the Elisashberg eq

nstemp = 15 ! Nr. of temps
temps = 2 30 ! K provide list of temperetures OR (nstemp and temps = tempsmin tempsmax for even space mode)

nsiter = 500

muc = 0.1

dvscf_dir = '../11.25e-p/save'

nk1 = 12
nk2 = 12
nk3 = 12

nq1 = 6
nq2 = 6
nq3 = 6

mp_mesh_k = .true.
nkf1 = 48
nkf2 = 48
nkf3 = 48

nqf1 = 24
nqf2 = 24
nqf3 = 24
/

I would be thankful if any body could help me or give some suggestions.

Best regards,
xj Li

stiwari
Posts: 29
Joined: Mon Jun 26, 2023 9:48 pm
Affiliation: UT Austin

Re: slurmstepd: error: Detected 2 oom-kill event(s)

Post by stiwari »

Hi,

From a first look it appears that your system where you are running EPW does not have enough memory. If it is an HPC, you can try to increase the number of nodes (while keeping the same number of cores) and see if this error goes away. Or reduce the nkf, nqf and check if it goes away. Otherwise, please post your EPW output files (.out file).

Best regards,
Sabya.

Post Reply