How to reduce load on parallel file system
Posted: Tue May 21, 2019 2:56 pm
Hi,
I'm supporting a user on our HPC facility running epw from QE 6.3. Unfortunately the jobs the user is running is generating a very high load on our parallel file system (GPFS) to the extent that several (2-3) concurrent multi-node (between 3-10 nodes) jobs are causing the file system to become unusable for other users.
Does anyone have advice on reducing this IO load? I believe with QE (pw.x) can you separately set wfcdir to a local disk (for per processor files) and outdir to the parallel file system to reduce disk IO, as well as setting disk_io. However for epw it seems that everything goes via outdir and setting it to a local disk for multinode jobs results in MPI_FILE_OPEN errors.
Any advice or suggestions would be welcome, apologies if I've misunderstood or missed something.
Thanks
I'm supporting a user on our HPC facility running epw from QE 6.3. Unfortunately the jobs the user is running is generating a very high load on our parallel file system (GPFS) to the extent that several (2-3) concurrent multi-node (between 3-10 nodes) jobs are causing the file system to become unusable for other users.
Does anyone have advice on reducing this IO load? I believe with QE (pw.x) can you separately set wfcdir to a local disk (for per processor files) and outdir to the parallel file system to reduce disk IO, as well as setting disk_io. However for epw it seems that everything goes via outdir and setting it to a local disk for multinode jobs results in MPI_FILE_OPEN errors.
Any advice or suggestions would be welcome, apologies if I've misunderstood or missed something.
Thanks