How to reduce load on parallel file system

Post here questions linked with issue while running the EPW code

Moderator: stiwari

Post Reply
ganwar
Posts: 5
Joined: Wed May 08, 2019 11:22 am
Affiliation:

How to reduce load on parallel file system

Post by ganwar »

Hi,

I'm supporting a user on our HPC facility running epw from QE 6.3. Unfortunately the jobs the user is running is generating a very high load on our parallel file system (GPFS) to the extent that several (2-3) concurrent multi-node (between 3-10 nodes) jobs are causing the file system to become unusable for other users.

Does anyone have advice on reducing this IO load? I believe with QE (pw.x) can you separately set wfcdir to a local disk (for per processor files) and outdir to the parallel file system to reduce disk IO, as well as setting disk_io. However for epw it seems that everything goes via outdir and setting it to a local disk for multinode jobs results in MPI_FILE_OPEN errors.

Any advice or suggestions would be welcome, apologies if I've misunderstood or missed something.

Thanks

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: How to reduce load on parallel file system

Post by sponce »

Dear ganwar,

Thanks for your message.

Can you be more specific on the type of calculation the user is running ?

A traditional EPW calculation will write .epb files (local on each node) as well as a .epmatwp1 file.
However that later file is then indeed read using MPI_READ (i.e. all the cores are reading the same big file). Upon testing this should not stress the cluster too much. If it does can you tell us which part of the code is responsible.

Now if the user use the newly implemented feature of mobility calculation, then yes everything is MPI_SEEK + MPI_WRITE.
Actually I'm working on an alternative possibility were everything is local for the mobility.

It would also help if you can get the user to send a typical input file that generate the problem as well as the size of the XX.epmatwp1 file.

Thanks,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

ganwar
Posts: 5
Joined: Wed May 08, 2019 11:22 am
Affiliation:

Re: How to reduce load on parallel file system

Post by ganwar »

Hi Samuel,

Thanks for your response.

I suspect the issue is really the epb files but given our walltime limit (48 hours) the user said they wanted to keep these in case the jobs don't reach the epw phase in time. I had suggested that they set etf_mem but this causes the jobs to run out of RAM.

I've asked the user to look over you questions and either respond via here or through me so I'll let you know when I have more.

Thanks

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: How to reduce load on parallel file system

Post by sponce »

Dear ganwar,

It would be very surprising that the generation of .epb files generate a lot of load on your filesystem.
Those files are produced locally (.i.e. each cores within a node should be writing on its own scratch with no communication between nodes).

They should indeed keep those files until they generate the XX.epmatwp1 file. Once that is done, it is safe for them to remove all the .epb files.

Best wishes,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

ganwar
Posts: 5
Joined: Wed May 08, 2019 11:22 am
Affiliation:

Re: How to reduce load on parallel file system

Post by ganwar »

Hi Samuel,

"Those files are produced locally (.i.e. each cores within a node should be writing on its own scratch with no communication between nodes)."

While the files are produced locally, they seem to be written to the same directory as "outdir" which in our case is a cluster-wide file system. It would be useful if there were a way to save those files in a separate location from outdir.

However having said that, I can see a job running at the moment which is only accessing a .epmatwp1 file and that does seem to be generating a lot of IO so I am probably wrong about the .epb files being the cause of the high IO.

Thanks

Chathu
Posts: 1
Joined: Wed May 29, 2019 7:12 pm
Affiliation:

Re: How to reduce load on parallel file system

Post by Chathu »

ganwar wrote:Hi,

I'm supporting a user on our HPC facility running epw from QE 6.3. Unfortunately the jobs the user is running is generating a very high load on our parallel file system (GPFS) to the extent that several (2-3) concurrent multi-node (between 3-10 nodes) jobs are causing the file system to become unusable for other users.

Does anyone have advice on reducing this IO load? I believe with QE (pw.x) can you separately set wfcdir to a local disk (for per processor files) and outdir to the parallel file system to reduce disk IO, as well as setting disk_io. However for epw it seems that everything goes via outdir and setting it to a local disk for multinode jobs results in MPI_FILE_OPEN errors.

Any advice or suggestions would be welcome, apologies if I've misunderstood or missed something.

Thanks


Hi Samuel,

I am the HPC user mentioned here. Following is my input file for epw.

--
&inputepw
prefix = 'NbCoSn',

amass(1) = 92.90638
amass(2) =58.933195
amass(3) =118.71
! outdir = '/tmp/esscmv/NbCoSn/'
! dvscf_dir = '/tinisgpfs/home/csc/esscmv/bandstructure_qe/NbCoSn/EPW_2/save'
outdir = './'
dvscf_dir = './save'


elph = .true.
kmaps = .true.
epbwrite = .true.
epbread = .false.

epwwrite = .true.
epwread = .false.

nbndsub = 12
nbndskip = 0

wannierize = .true.
num_iter = 300
dis_win_max = 25
dis_win_min = 0
dis_froz_min= 14
dis_froz_max= 25

wdata(1) = 'bands_plot = .true.'
wdata(2) = 'begin kpoint_path'
wdata(3) = 'G 0.00 0.00 0.00 X 0.00 0.50 0.50'
wdata(4) = 'X 0.00 0.50 0.50 W 0.25 0.50 0.75'
wdata(5) = 'W 0.25 0.50 0.75 L 0.50 0.50 0.50'
wdata(6) = 'L 0.50 0.50 0.50 K 0.375 0.375 0.75'
wdata(7) = 'K 0.375 0.375 0.75 G 0.00 0.00 0.00'
wdata(8) = 'G 0.00 0.00 0.00 L 0.50 0.50 0.50'
wdata(9) = 'end kpoint_path'
wdata(10) = 'bands_plot_format = gnuplot'

iverbosity = 3
etf_mem = 1
restart=.true.
restart_freq=1000

elecselfen = .true.
delta_approx= .true.
phonselfen = .false.
efermi_read = .true.
fermi_energy= 16.4224


fsthick = 2.5 ! eV
eptemp = 300 ! K
degaussw = 0.05 ! eV

a2f = .false.


nkf1 = 48
nkf2 = 48
nkf3 = 48

nqf1 = 48
nqf2 = 48
nqf3 = 48

nk1 = 24
nk2 = 24
nk3 = 24

nq1 = 6
nq2 = 6
nq3 = 6
/
16 cartesian
0.000000000000000E+00 0.000000000000000E+00 0.000000000000000E+00
0.117851130197756E+00 0.117851130197756E+00 -0.117851130197756E+00
0.235702260395511E+00 0.235702260395511E+00 -0.235702260395511E+00
-0.353553390593267E+00 -0.353553390593267E+00 0.353553390593267E+00
0.235702260395511E+00 -0.654205191118227E-17 0.654205191118227E-17
0.353553390593267E+00 0.117851130197756E+00 -0.117851130197756E+00
-0.235702260395511E+00 -0.471404520791023E+00 0.471404520791023E+00
-0.117851130197756E+00 -0.353553390593267E+00 0.353553390593267E+00
0.261682076447291E-16 -0.235702260395511E+00 0.235702260395511E+00
0.471404520791023E+00 -0.130841038223645E-16 0.130841038223645E-16
-0.117851130197756E+00 -0.589255650988778E+00 0.589255650988778E+00
-0.261682076447291E-16 -0.471404520791023E+00 0.471404520791023E+00
-0.707106781186534E+00 0.000000000000000E+00 0.000000000000000E+00
-0.235702260395511E+00 -0.471404520791023E+00 0.707106781186534E+00
-0.117851130197756E+00 -0.353553390593267E+00 0.589255650988778E+00
-0.707106781186534E+00 0.235702260395511E+00 0.261682076447291E-16


I would be relly grateful if you can have look and see if there is anything I can change here to solve this issue.

Regards,
Chathu

ganwar
Posts: 5
Joined: Wed May 08, 2019 11:22 am
Affiliation:

Re: How to reduce load on parallel file system

Post by ganwar »

Hi Samuel,

Just to add to Chathu's comment, I've managed to get a little detail from storage system regarding the load on the file system for a job that's currently active. As far as I can tell the high IO (for this particular stage in the job) appears to be due to the MPI tasks (128 of them) reading from the epmatwp1 file which is currently 95GB. I don't know if this is helpful at all.

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: How to reduce load on parallel file system

Post by sponce »

Hello,

Oh I see. Well, clearly the "coarse" grid is not coarse at all.
I can see that the user is using:

Code: Select all

nk1 = 24
nk2 = 24
nk3 = 24

which is very dense.

The idea of our software EPW is precisely to avoid having to use such dense coarse grids by relying on the properties of MLWF.
This seems total overkill to me.
At most the user should use

Code: Select all

nk1 = 12
nk2 = 12
nk3 = 12


and might probably be fine with

Code: Select all

nk1 = 6
nk2 = 6
nk3 = 6


This will drastically reduce the size of the epmatwp1 file to ~3 Gb which will significantly decrease the IO.

Best wishes,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

ganwar
Posts: 5
Joined: Wed May 08, 2019 11:22 am
Affiliation:

Re: How to reduce load on parallel file system

Post by ganwar »

Hi Samuel,

Apologies for not responding to your message earlier. I just wanted to confirm that following your advice Chathu has started running a more coarse grid which has significantly reduce the load on the file system. Thanks once again for your advice.

Post Reply