Error during benchmarking

This section is dedicated to compilation problems

Moderator: stiwari

Post Reply
Aa410733031
Posts: 19
Joined: Wed Aug 24, 2022 11:52 pm
Affiliation: NYCU

Error during benchmarking

Post by Aa410733031 »

Hello,
I'm trying to reproduce the Pb in 2023 school. I use the qe-7.3 and epw 5.8 which epw is in q-e-qe-7.3,
I use module intel/2023, intelmpi/2021.11 to compiled the epw code. But when i reproduce the Pb in parallel i have some error.

jobscript
#!/bin/bash
#SBATCH -A MST113098
#SBATCH -J pb_tut
#SBATCH -p ctest
#SBATCH -n 8 ## Total cores
#SBATCH -c 1 ## Without this option, the controller will just try to allocate one processor per task.
#SBATCH -N 1 ## Number of nodes
#SBATCH -o %j.out
#SBATCH -e %j.err

## Ref: https://slurm.schedmd.com/sbatch.html
## Ref: https://man.twcc.ai/@TWCC-III-manual/Sy9-QqHiO
module purge
module load intel/2023 intelmpi/2021.11
export OMP_NUM_THREADS=1
#execute="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/PW/src/pw.x"
#execute_ph="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/bin/ph.x"
execute_epw="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/EPW/bin/epw.x"
MPIRUN="mpiexec.hydra -n $SLURM_NPROCS"

After runing epw.x in parallel it generate CRASH
CRASH
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from epw_readin : error # 1
Number of processes must be equal to product of number of pools and
number of images
Image parallelization can be used only in calculations on coarse grid.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ERROR
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Abort(1) on node 3 (rank 3 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
Abort(1) on node 6 (rank 6 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 6

How can i solve this problem?
Thank you!
NYCU
Ray,Wu
stiwari
Posts: 50
Joined: Mon Jun 26, 2023 9:48 pm
Affiliation: UT Austin

Re: Error during benchmarking

Post by stiwari »

Hi Wu,

As the error suggests, the number of tasks "#SBATCH -n 8 " (8 in your case) should be the same as the number of k-point parallelization.
Simply put for your case you should use,
"mpiexec.hydra -n $SLURM_NPROCS <location of QE bin>/epw.x -nk $SLURM_NPROCS -in <input>"
Notice -nk should be the same as -n.

Best,
Sabya.
Aa410733031
Posts: 19
Joined: Wed Aug 24, 2022 11:52 pm
Affiliation: NYCU

Re: Error during benchmarking

Post by Aa410733031 »

Dear stiwari :
This is my epw.in input file
--
&inputepw
prefix = 'pb',
amass(1) = 207.2
outdir = './'

elph = .true.
epbwrite = .true.
epbread = .false.

epwwrite = .true.
epwread = .false.

nbndsub = 4
bands_skipped = 'exclude_bands = 1-5'

wannierize = .true.
num_iter = 300
dis_win_max = 21
dis_win_min = -3
dis_froz_min= -3
dis_froz_max= 13.5
proj(1) = 'Pb:sp3'

wdata(1) = 'bands_plot = .true.'
wdata(2) = 'begin kpoint_path'
wdata(3) = 'G 0.00 0.00 0.00 X 0.00 0.50 0.50'
wdata(4) = 'X 0.00 0.50 0.50 W 0.25 0.50 0.75'
wdata(5) = 'W 0.25 0.50 0.75 L 0.50 0.50 0.50'
wdata(6) = 'L 0.50 0.50 0.50 K 0.375 0.375 0.75'
wdata(7) = 'K 0.375 0.375 0.75 G 0.00 0.00 0.00'
wdata(8) = 'G 0.00 0.00 0.00 L 0.50 0.50 0.50'
wdata(9) = 'end kpoint_path'
wdata(10) = 'bands_plot_format = gnuplot'

iverbosity = 0

elecselfen = .false.
phonselfen = .true.


fsthick = 6 ! eV
temps = 0.075 ! K
degaussw = 0.05 ! eV

a2f = .true.

dvscf_dir = '../phonons/save'

nkf1 = 20
nkf2 = 20
nkf3 = 20

nqf1 = 20
nqf2 = 20
nqf3 = 20

nk1 = 6
nk2 = 6
nk3 = 6

nq1 = 6
nq2 = 6
nq3 = 6

/
and this is my jobscript
#!/bin/bash
#SBATCH -A MST113098
#SBATCH -J pb_tut
#SBATCH -p ct56
#SBATCH -n 216 ## Total cores
#SBATCH -c 1 ## Without this option, the controller will just try to allocate one processor per task.
#SBATCH -N 6 ## Number of nodes
#SBATCH -o %j.out
#SBATCH -e %j.err

## Ref: https://slurm.schedmd.com/sbatch.html
## Ref: https://man.twcc.ai/@TWCC-III-manual/Sy9-QqHiO
module purge
module load intel/2023 intelmpi/2021.11
export OMP_NUM_THREADS=1
#execute="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/PW/src/pw.x"
#execute_ph="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/bin/ph.x"
execute_epw="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/EPW/bin/epw.x"
MPIRUN="mpiexec.hydra -n $SLURM_NPROCS"

#$MPIRUN $execute -inp scf.in > scf.out
#$MPIRUN $execute -inp nscf.in > nscf.out
#$MPIRUN $execute_ph -inp ph.in > ph.out
$MPIRUN $execute_epw -inp epw.in > epw.out
#$execute_epw -inp epw2.in > epw2.out

I try to match my node in nk1*nk2*nk3 = 216 but it still have the error
--
CRASH:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from epw_readin : error # 1
Number of processes must be equal to product of number of pools and
number of images
Image parallelization can be used only in calculations on coarse grid.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I want to ask what k-point should I according to?


Thank you!
NYCU
Ray,Wu
stiwari
Posts: 50
Joined: Mon Jun 26, 2023 9:48 pm
Affiliation: UT Austin

Re: Error during benchmarking

Post by stiwari »

Hi Wu,

What you need to do is "$MPIRUN $execute_epw -nk $SLURM_NPROCS -inp epw.in > epw.out". Same for epw2.in ($execute_epw -nk $SLURM_NPROCS -inp epw2.in > epw2.out").

The issue here is that you have 8 tasks which need to be k-parallelized using "-nk".
EPW only parallelizes over k-gird. Hence, when you do not specify -nk, the code automatically considers -ni 8 (8 images) and -nk 1. Current version EPW v5.8 only works with -nk (number of tasks ).

Best regards,
Sabya.
Aa410733031
Posts: 19
Joined: Wed Aug 24, 2022 11:52 pm
Affiliation: NYCU

Re: Error during benchmarking

Post by Aa410733031 »

Dear stiwari :
I can run epw.x in parallel, but there is still an error when calculating the conductivity value. In the 2024 school assignment, the calculated conductivity for lead has an error of 3^0.5.
=============================================================================================
BTE in the self-energy relaxation time approximation (SERTA)
=============================================================================================

=============================================================================================
Temp Fermi DOS Population SR Conductivity
[K] [eV] [states/Ry] [carriers per cell] [Ohm.cm]^-1
=============================================================================================

100.000 11.7042 0.12315E+01 0.37435E-14 0.106043E+06 0.205347E-12 -0.113672E-11
-0.17764E-14 0.218169E-12 0.106043E+06 -0.205147E-12
0.19412E-14 -0.121606E-11 -0.204946E-12 0.106043E+06
150.000 11.7042 0.12151E+01 0.19429E-14 0.707340E+05 -0.889503E-13 0.743256E-13
0.00000E+00 -0.217968E-12 0.707340E+05 0.217968E-12
0.38997E-14 -0.226182E-12 0.889503E-13 0.707340E+05
200.000 11.7046 0.15833E+01 0.21649E-14 0.538310E+05 0.160271E-14 -0.251625E-12
0.44409E-15 -0.115395E-12 0.538310E+05 0.961625E-14
0.38858E-15 -0.431930E-12 -0.320542E-14 0.538310E+05
250.000 11.7050 0.20004E+01 0.16653E-14 0.429619E+05 0.256433E-13 0.294098E-12
0.00000E+00 -0.512867E-13 0.429619E+05 0.480813E-13
0.13879E-15 -0.208350E-13 -0.320542E-14 0.429619E+05
300.000 11.7051 0.22216E+01 -0.44409E-15 0.353196E+05 0.320463E-14 0.234794E-12
-0.22204E-15 -0.480820E-13 0.353196E+05 -0.288495E-13
-0.24980E-15 0.233992E-12 0.544913E-13 0.353196E+05
350.000 11.7051 0.22480E+01 -0.74937E-15 0.297562E+05 -0.673138E-13 0.327744E-12
0.00000E+00 -0.128201E-13 0.297562E+05 0.128232E-13
-0.11657E-14 0.354199E-12 0.192325E-13 0.297562E+05
400.000 11.7051 0.21747E+01 -0.10547E-14 0.256224E+05 0.320542E-14 0.392601E-13
0.00000E+00 -0.480813E-14 0.256224E+05 -0.657110E-13
0.88812E-15 0.184249E-13 -0.496840E-13 0.256224E+05
450.000 11.7050 0.20829E+01 0.97155E-15 0.224945E+05 0.112127E-13 -0.709449E-13
0.00000E+00 -0.100107E-13 0.224945E+05 -0.359983E-14
-0.69237E-15 0.461154E-13 0.159645E-14 0.224945E+05
500.000 11.7050 0.20180E+01 -0.33372E-15 0.200812E+05 0.482065E-14 -0.204320E-12
0.11102E-15 -0.208227E-13 0.200812E+05 0.428850E-13
-0.55511E-15 -0.223553E-12 0.192450E-13 0.200812E+05

=============================================================================================
Start solving iterative Boltzmann Transport Equation
=============================================================================================
and my epw1.out have some different of the solution of school 2024
===================================================================
Eliashberg Spectral Function in the Migdal Approximation
===================================================================

lambda : 1.7322402
lambda_tr : 1.4114304

Estimated Allen-Dynes Tc

logavg = 0.0002379 l_a2f = 1.7328824
mu = 0.10 Tc = 4.856972538456 K
mu = 0.12 Tc = 4.610447882497 K
mu = 0.14 Tc = 4.363342098283 K
mu = 0.16 Tc = 4.116029512202 K
mu = 0.18 Tc = 3.868931518310 K
mu = 0.20 Tc = 3.622520874872 K
I want to ask how can I fix this problem
stiwari
Posts: 50
Joined: Mon Jun 26, 2023 9:48 pm
Affiliation: UT Austin

Re: Error during benchmarking

Post by stiwari »

Hi,

If I understood your question correctly, the conductivity values you obtain for lead are not closer to experiments?
The tutorials used for the school are performed on a much coarser grids than usual. For converged results you have to use much finer grids.

Please provide your input/output files in case your problem is different than what I understood.

Best regards,
Sabya.
Post Reply