Hi,
I'm supporting a user on our HPC facility running epw from QE 6.3. Unfortunately the jobs the user is running is generating a very high load on our parallel file system (GPFS) to the extent that several (2-3) concurrent multi-node (between 3-10 nodes) jobs are causing the file system to become unusable for other users.
Does anyone have advice on reducing this IO load? I believe with QE (pw.x) can you separately set wfcdir to a local disk (for per processor files) and outdir to the parallel file system to reduce disk IO, as well as setting disk_io. However for epw it seems that everything goes via outdir and setting it to a local disk for multinode jobs results in MPI_FILE_OPEN errors.
Any advice or suggestions would be welcome, apologies if I've misunderstood or missed something.
Thanks
How to reduce load on parallel file system
Moderator: stiwari
Re: How to reduce load on parallel file system
Dear ganwar,
Thanks for your message.
Can you be more specific on the type of calculation the user is running ?
A traditional EPW calculation will write .epb files (local on each node) as well as a .epmatwp1 file.
However that later file is then indeed read using MPI_READ (i.e. all the cores are reading the same big file). Upon testing this should not stress the cluster too much. If it does can you tell us which part of the code is responsible.
Now if the user use the newly implemented feature of mobility calculation, then yes everything is MPI_SEEK + MPI_WRITE.
Actually I'm working on an alternative possibility were everything is local for the mobility.
It would also help if you can get the user to send a typical input file that generate the problem as well as the size of the XX.epmatwp1 file.
Thanks,
Samuel
Thanks for your message.
Can you be more specific on the type of calculation the user is running ?
A traditional EPW calculation will write .epb files (local on each node) as well as a .epmatwp1 file.
However that later file is then indeed read using MPI_READ (i.e. all the cores are reading the same big file). Upon testing this should not stress the cluster too much. If it does can you tell us which part of the code is responsible.
Now if the user use the newly implemented feature of mobility calculation, then yes everything is MPI_SEEK + MPI_WRITE.
Actually I'm working on an alternative possibility were everything is local for the mobility.
It would also help if you can get the user to send a typical input file that generate the problem as well as the size of the XX.epmatwp1 file.
Thanks,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Re: How to reduce load on parallel file system
Hi Samuel,
Thanks for your response.
I suspect the issue is really the epb files but given our walltime limit (48 hours) the user said they wanted to keep these in case the jobs don't reach the epw phase in time. I had suggested that they set etf_mem but this causes the jobs to run out of RAM.
I've asked the user to look over you questions and either respond via here or through me so I'll let you know when I have more.
Thanks
Thanks for your response.
I suspect the issue is really the epb files but given our walltime limit (48 hours) the user said they wanted to keep these in case the jobs don't reach the epw phase in time. I had suggested that they set etf_mem but this causes the jobs to run out of RAM.
I've asked the user to look over you questions and either respond via here or through me so I'll let you know when I have more.
Thanks
Re: How to reduce load on parallel file system
Dear ganwar,
It would be very surprising that the generation of .epb files generate a lot of load on your filesystem.
Those files are produced locally (.i.e. each cores within a node should be writing on its own scratch with no communication between nodes).
They should indeed keep those files until they generate the XX.epmatwp1 file. Once that is done, it is safe for them to remove all the .epb files.
Best wishes,
Samuel
It would be very surprising that the generation of .epb files generate a lot of load on your filesystem.
Those files are produced locally (.i.e. each cores within a node should be writing on its own scratch with no communication between nodes).
They should indeed keep those files until they generate the XX.epmatwp1 file. Once that is done, it is safe for them to remove all the .epb files.
Best wishes,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Re: How to reduce load on parallel file system
Hi Samuel,
"Those files are produced locally (.i.e. each cores within a node should be writing on its own scratch with no communication between nodes)."
While the files are produced locally, they seem to be written to the same directory as "outdir" which in our case is a cluster-wide file system. It would be useful if there were a way to save those files in a separate location from outdir.
However having said that, I can see a job running at the moment which is only accessing a .epmatwp1 file and that does seem to be generating a lot of IO so I am probably wrong about the .epb files being the cause of the high IO.
Thanks
"Those files are produced locally (.i.e. each cores within a node should be writing on its own scratch with no communication between nodes)."
While the files are produced locally, they seem to be written to the same directory as "outdir" which in our case is a cluster-wide file system. It would be useful if there were a way to save those files in a separate location from outdir.
However having said that, I can see a job running at the moment which is only accessing a .epmatwp1 file and that does seem to be generating a lot of IO so I am probably wrong about the .epb files being the cause of the high IO.
Thanks
Re: How to reduce load on parallel file system
ganwar wrote:Hi,
I'm supporting a user on our HPC facility running epw from QE 6.3. Unfortunately the jobs the user is running is generating a very high load on our parallel file system (GPFS) to the extent that several (2-3) concurrent multi-node (between 3-10 nodes) jobs are causing the file system to become unusable for other users.
Does anyone have advice on reducing this IO load? I believe with QE (pw.x) can you separately set wfcdir to a local disk (for per processor files) and outdir to the parallel file system to reduce disk IO, as well as setting disk_io. However for epw it seems that everything goes via outdir and setting it to a local disk for multinode jobs results in MPI_FILE_OPEN errors.
Any advice or suggestions would be welcome, apologies if I've misunderstood or missed something.
Thanks
Hi Samuel,
I am the HPC user mentioned here. Following is my input file for epw.
--
&inputepw
prefix = 'NbCoSn',
amass(1) = 92.90638
amass(2) =58.933195
amass(3) =118.71
! outdir = '/tmp/esscmv/NbCoSn/'
! dvscf_dir = '/tinisgpfs/home/csc/esscmv/bandstructure_qe/NbCoSn/EPW_2/save'
outdir = './'
dvscf_dir = './save'
elph = .true.
kmaps = .true.
epbwrite = .true.
epbread = .false.
epwwrite = .true.
epwread = .false.
nbndsub = 12
nbndskip = 0
wannierize = .true.
num_iter = 300
dis_win_max = 25
dis_win_min = 0
dis_froz_min= 14
dis_froz_max= 25
wdata(1) = 'bands_plot = .true.'
wdata(2) = 'begin kpoint_path'
wdata(3) = 'G 0.00 0.00 0.00 X 0.00 0.50 0.50'
wdata(4) = 'X 0.00 0.50 0.50 W 0.25 0.50 0.75'
wdata(5) = 'W 0.25 0.50 0.75 L 0.50 0.50 0.50'
wdata(6) = 'L 0.50 0.50 0.50 K 0.375 0.375 0.75'
wdata(7) = 'K 0.375 0.375 0.75 G 0.00 0.00 0.00'
wdata(8) = 'G 0.00 0.00 0.00 L 0.50 0.50 0.50'
wdata(9) = 'end kpoint_path'
wdata(10) = 'bands_plot_format = gnuplot'
iverbosity = 3
etf_mem = 1
restart=.true.
restart_freq=1000
elecselfen = .true.
delta_approx= .true.
phonselfen = .false.
efermi_read = .true.
fermi_energy= 16.4224
fsthick = 2.5 ! eV
eptemp = 300 ! K
degaussw = 0.05 ! eV
a2f = .false.
nkf1 = 48
nkf2 = 48
nkf3 = 48
nqf1 = 48
nqf2 = 48
nqf3 = 48
nk1 = 24
nk2 = 24
nk3 = 24
nq1 = 6
nq2 = 6
nq3 = 6
/
16 cartesian
0.000000000000000E+00 0.000000000000000E+00 0.000000000000000E+00
0.117851130197756E+00 0.117851130197756E+00 -0.117851130197756E+00
0.235702260395511E+00 0.235702260395511E+00 -0.235702260395511E+00
-0.353553390593267E+00 -0.353553390593267E+00 0.353553390593267E+00
0.235702260395511E+00 -0.654205191118227E-17 0.654205191118227E-17
0.353553390593267E+00 0.117851130197756E+00 -0.117851130197756E+00
-0.235702260395511E+00 -0.471404520791023E+00 0.471404520791023E+00
-0.117851130197756E+00 -0.353553390593267E+00 0.353553390593267E+00
0.261682076447291E-16 -0.235702260395511E+00 0.235702260395511E+00
0.471404520791023E+00 -0.130841038223645E-16 0.130841038223645E-16
-0.117851130197756E+00 -0.589255650988778E+00 0.589255650988778E+00
-0.261682076447291E-16 -0.471404520791023E+00 0.471404520791023E+00
-0.707106781186534E+00 0.000000000000000E+00 0.000000000000000E+00
-0.235702260395511E+00 -0.471404520791023E+00 0.707106781186534E+00
-0.117851130197756E+00 -0.353553390593267E+00 0.589255650988778E+00
-0.707106781186534E+00 0.235702260395511E+00 0.261682076447291E-16
I would be relly grateful if you can have look and see if there is anything I can change here to solve this issue.
Regards,
Chathu
Re: How to reduce load on parallel file system
Hi Samuel,
Just to add to Chathu's comment, I've managed to get a little detail from storage system regarding the load on the file system for a job that's currently active. As far as I can tell the high IO (for this particular stage in the job) appears to be due to the MPI tasks (128 of them) reading from the epmatwp1 file which is currently 95GB. I don't know if this is helpful at all.
Just to add to Chathu's comment, I've managed to get a little detail from storage system regarding the load on the file system for a job that's currently active. As far as I can tell the high IO (for this particular stage in the job) appears to be due to the MPI tasks (128 of them) reading from the epmatwp1 file which is currently 95GB. I don't know if this is helpful at all.
Re: How to reduce load on parallel file system
Hello,
Oh I see. Well, clearly the "coarse" grid is not coarse at all.
I can see that the user is using:
which is very dense.
The idea of our software EPW is precisely to avoid having to use such dense coarse grids by relying on the properties of MLWF.
This seems total overkill to me.
At most the user should use
and might probably be fine with
This will drastically reduce the size of the epmatwp1 file to ~3 Gb which will significantly decrease the IO.
Best wishes,
Samuel
Oh I see. Well, clearly the "coarse" grid is not coarse at all.
I can see that the user is using:
Code: Select all
nk1 = 24
nk2 = 24
nk3 = 24
which is very dense.
The idea of our software EPW is precisely to avoid having to use such dense coarse grids by relying on the properties of MLWF.
This seems total overkill to me.
At most the user should use
Code: Select all
nk1 = 12
nk2 = 12
nk3 = 12
and might probably be fine with
Code: Select all
nk1 = 6
nk2 = 6
nk3 = 6
This will drastically reduce the size of the epmatwp1 file to ~3 Gb which will significantly decrease the IO.
Best wishes,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Re: How to reduce load on parallel file system
Hi Samuel,
Apologies for not responding to your message earlier. I just wanted to confirm that following your advice Chathu has started running a more coarse grid which has significantly reduce the load on the file system. Thanks once again for your advice.
Apologies for not responding to your message earlier. I just wanted to confirm that following your advice Chathu has started running a more coarse grid which has significantly reduce the load on the file system. Thanks once again for your advice.