Running EPW on HPC
Posted: Wed Oct 19, 2022 5:25 am
Hi all,
I'm currently doing quite a bit of work with EPW on TACC Frontera and was hoping for insight into how the code scales with cores. Currently I'm using ~20 nodes (1120 cores) to do a calculation on Nb and I'm getting very poor CPU usage from the SLURM report and was wondering if maybe I compiled something wrong or am executing the program incorrectly.
Compiled QE + EPW with Intel 19.1.1 (MKL), --with-scalapack=intel, and FFTW3 (3.3.10)
Running 8x8x8 k and 8x8x8 q grids interpolated to 40x40x40 k-fine, 20x20x20 q-fine.
after calculating phonons and using pp.py to grab the necessary files for EPW I run:
*note: ibrun is TACC Frontera's mpirun*
ibrun -np 1120 pw.x -nk 56 -i scf.in > scf.out
ibrun -np 1120 pw.x -nk 1120 -i nscf.in > nscf.out
ibrun -np 1120 epw.x -nk 1120 -i epw.in > epw.out
I'm getting fine results (good spreads on my WFs, not quite converged decays but working on it), but the slurm command seff reveals that I'm only using ~10% CPU usage after the run is complete. Is this normal?
Thanks,
Adam
I'm currently doing quite a bit of work with EPW on TACC Frontera and was hoping for insight into how the code scales with cores. Currently I'm using ~20 nodes (1120 cores) to do a calculation on Nb and I'm getting very poor CPU usage from the SLURM report and was wondering if maybe I compiled something wrong or am executing the program incorrectly.
Compiled QE + EPW with Intel 19.1.1 (MKL), --with-scalapack=intel, and FFTW3 (3.3.10)
Running 8x8x8 k and 8x8x8 q grids interpolated to 40x40x40 k-fine, 20x20x20 q-fine.
after calculating phonons and using pp.py to grab the necessary files for EPW I run:
*note: ibrun is TACC Frontera's mpirun*
ibrun -np 1120 pw.x -nk 56 -i scf.in > scf.out
ibrun -np 1120 pw.x -nk 1120 -i nscf.in > nscf.out
ibrun -np 1120 epw.x -nk 1120 -i epw.in > epw.out
I'm getting fine results (good spreads on my WFs, not quite converged decays but working on it), but the slurm command seff reveals that I'm only using ~10% CPU usage after the run is complete. Is this normal?
Thanks,
Adam