The epmatwp file is too large, causing an EPW runtime error

General discussion around the EPW software

Moderator: stiwari

Post Reply
guodonglin
Posts: 29
Joined: Mon Mar 29, 2021 5:56 am
Affiliation: CQU

The epmatwp file is too large, causing an EPW runtime error

Post by guodonglin »

Dear EPW developers, I am unsure if this is a bug.

With a 4x4x4 q-point grid:

In the epw.in input file, if the settings are: etf_mem = 0, mp_mesh_k = .false., epw_memdist = .true., the EPW run terminates.

If the settings are: etf_mem = 1, mp_mesh_k = .false., the EPW run completes successfully, but the SnSe.epmatwp file size is 33.37 GB.

When the q-point grid is reduced from 4x4x4 to 3x3x3:

With the settings: etf_mem = 0, mp_mesh_k = .false., epw_memdist = .true., the EPW run is successful, and the SnSe.epmatwp file size is 3.39 GB.

With the settings: etf_mem = 1, mp_mesh_k = .false., the EPW run is successful, and the SnSe.epmatwp file size is 3.39 GB.

For the 3x3x3 q-point grid, both settings work successfully, resulting in a SnSe.epmatwp file size of 3.39 GB. For the 4x4x4 q-point grid, only the settings etf_mem = 1, mp_mesh_k = .false. lead to a successful run, resulting in a SnSe.epmatwp file size of 33.37 GB, while the settings etf_mem = 0, mp_mesh_k = .false., epw_memdist = .true. cause the run to fail. Therefore, I suspect that the SnSe.epmatwp file might be too large, leading to the EPW runtime error.
Attachments
in-out put.zip
(20.5 KiB) Downloaded 1047 times
Shashi
Posts: 58
Joined: Mon Feb 12, 2024 2:21 pm
Affiliation: SUNY Binghamton

Re: The epmatwp file is too large, causing an EPW runtime error

Post by Shashi »

Hi ,

This is not a bug. With a 4×4×4 q-grid, the number of q-points is 64, and the corresponding k-point mesh is 8×8×8. The size of epmatwp is given by:

nbndsub × nbndsub × nrr_k × nmodes × nrr_g

where nbndsub is the number of Wannier projections, nrr_k is proportional to the coarse k-grid, and nrr_g is proportional to the coarse q-grid.

In your case, this results in epmatwp being 33.37 GB. When etf_mem = 0, this matrix is kept in memory on each node, causing the code to crash due to insufficient memory. You can either request more memory per job (e.g., more nodes while keeping the same number of processors) or switch to etf_mem = 1.

Using epwmem_dist = .true. helps, but it only applies after the Wannier-to-Bloch transformation. Memory distribution for the Bloch-to-Wannier step has not yet been implemented.

When you used a 3×3×3 q-grid (likely with a 6×6×6 k-grid), epmatwp was 3.39 GB, which fits within the available memory, so etf_mem = 0 worked fine in that case.

I hope this clarifies the issue.

Regards,
Shashi
Post Reply