Error during benchmarking

This section is dedicated to compilation problems

Moderator: stiwari

Post Reply
Aa410733031
Posts: 12
Joined: Wed Aug 24, 2022 11:52 pm
Affiliation: NYCU

Error during benchmarking

Post by Aa410733031 »

Hello,
I'm trying to reproduce the Pb in 2023 school. I use the qe-7.3 and epw 5.8 which epw is in q-e-qe-7.3,
I use module intel/2023, intelmpi/2021.11 to compiled the epw code. But when i reproduce the Pb in parallel i have some error.

jobscript
#!/bin/bash
#SBATCH -A MST113098
#SBATCH -J pb_tut
#SBATCH -p ctest
#SBATCH -n 8 ## Total cores
#SBATCH -c 1 ## Without this option, the controller will just try to allocate one processor per task.
#SBATCH -N 1 ## Number of nodes
#SBATCH -o %j.out
#SBATCH -e %j.err

## Ref: https://slurm.schedmd.com/sbatch.html
## Ref: https://man.twcc.ai/@TWCC-III-manual/Sy9-QqHiO
module purge
module load intel/2023 intelmpi/2021.11
export OMP_NUM_THREADS=1
#execute="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/PW/src/pw.x"
#execute_ph="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/bin/ph.x"
execute_epw="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/EPW/bin/epw.x"
MPIRUN="mpiexec.hydra -n $SLURM_NPROCS"

After runing epw.x in parallel it generate CRASH
CRASH
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from epw_readin : error # 1
Number of processes must be equal to product of number of pools and
number of images
Image parallelization can be used only in calculations on coarse grid.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ERROR
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Abort(1) on node 3 (rank 3 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
Abort(1) on node 6 (rank 6 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 6

How can i solve this problem?
Thank you!
NYCU
Ray,Wu

stiwari
Posts: 36
Joined: Mon Jun 26, 2023 9:48 pm
Affiliation: UT Austin

Re: Error during benchmarking

Post by stiwari »

Hi Wu,

As the error suggests, the number of tasks "#SBATCH -n 8 " (8 in your case) should be the same as the number of k-point parallelization.
Simply put for your case you should use,
"mpiexec.hydra -n $SLURM_NPROCS <location of QE bin>/epw.x -nk $SLURM_NPROCS -in <input>"
Notice -nk should be the same as -n.

Best,
Sabya.

Aa410733031
Posts: 12
Joined: Wed Aug 24, 2022 11:52 pm
Affiliation: NYCU

Re: Error during benchmarking

Post by Aa410733031 »

Dear stiwari :
This is my epw.in input file
--
&inputepw
prefix = 'pb',
amass(1) = 207.2
outdir = './'

elph = .true.
epbwrite = .true.
epbread = .false.

epwwrite = .true.
epwread = .false.

nbndsub = 4
bands_skipped = 'exclude_bands = 1-5'

wannierize = .true.
num_iter = 300
dis_win_max = 21
dis_win_min = -3
dis_froz_min= -3
dis_froz_max= 13.5
proj(1) = 'Pb:sp3'

wdata(1) = 'bands_plot = .true.'
wdata(2) = 'begin kpoint_path'
wdata(3) = 'G 0.00 0.00 0.00 X 0.00 0.50 0.50'
wdata(4) = 'X 0.00 0.50 0.50 W 0.25 0.50 0.75'
wdata(5) = 'W 0.25 0.50 0.75 L 0.50 0.50 0.50'
wdata(6) = 'L 0.50 0.50 0.50 K 0.375 0.375 0.75'
wdata(7) = 'K 0.375 0.375 0.75 G 0.00 0.00 0.00'
wdata(8) = 'G 0.00 0.00 0.00 L 0.50 0.50 0.50'
wdata(9) = 'end kpoint_path'
wdata(10) = 'bands_plot_format = gnuplot'

iverbosity = 0

elecselfen = .false.
phonselfen = .true.


fsthick = 6 ! eV
temps = 0.075 ! K
degaussw = 0.05 ! eV

a2f = .true.

dvscf_dir = '../phonons/save'

nkf1 = 20
nkf2 = 20
nkf3 = 20

nqf1 = 20
nqf2 = 20
nqf3 = 20

nk1 = 6
nk2 = 6
nk3 = 6

nq1 = 6
nq2 = 6
nq3 = 6

/
and this is my jobscript
#!/bin/bash
#SBATCH -A MST113098
#SBATCH -J pb_tut
#SBATCH -p ct56
#SBATCH -n 216 ## Total cores
#SBATCH -c 1 ## Without this option, the controller will just try to allocate one processor per task.
#SBATCH -N 6 ## Number of nodes
#SBATCH -o %j.out
#SBATCH -e %j.err

## Ref: https://slurm.schedmd.com/sbatch.html
## Ref: https://man.twcc.ai/@TWCC-III-manual/Sy9-QqHiO
module purge
module load intel/2023 intelmpi/2021.11
export OMP_NUM_THREADS=1
#execute="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/PW/src/pw.x"
#execute_ph="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/bin/ph.x"
execute_epw="/home/kmlin1970/pkg/epw/epw_5.8.0/q-e-qe-7.3/EPW/bin/epw.x"
MPIRUN="mpiexec.hydra -n $SLURM_NPROCS"

#$MPIRUN $execute -inp scf.in > scf.out
#$MPIRUN $execute -inp nscf.in > nscf.out
#$MPIRUN $execute_ph -inp ph.in > ph.out
$MPIRUN $execute_epw -inp epw.in > epw.out
#$execute_epw -inp epw2.in > epw2.out

I try to match my node in nk1*nk2*nk3 = 216 but it still have the error
--
CRASH:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from epw_readin : error # 1
Number of processes must be equal to product of number of pools and
number of images
Image parallelization can be used only in calculations on coarse grid.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I want to ask what k-point should I according to?


Thank you!
NYCU
Ray,Wu

stiwari
Posts: 36
Joined: Mon Jun 26, 2023 9:48 pm
Affiliation: UT Austin

Re: Error during benchmarking

Post by stiwari »

Hi Wu,

What you need to do is "$MPIRUN $execute_epw -nk $SLURM_NPROCS -inp epw.in > epw.out". Same for epw2.in ($execute_epw -nk $SLURM_NPROCS -inp epw2.in > epw2.out").

The issue here is that you have 8 tasks which need to be k-parallelized using "-nk".
EPW only parallelizes over k-gird. Hence, when you do not specify -nk, the code automatically considers -ni 8 (8 images) and -nk 1. Current version EPW v5.8 only works with -nk (number of tasks ).

Best regards,
Sabya.

Post Reply