Dear EPW developers, I am unsure if this is a bug.
With a 4x4x4 q-point grid:
In the epw.in input file, if the settings are: etf_mem = 0, mp_mesh_k = .false., epw_memdist = .true., the EPW run terminates.
If the settings are: etf_mem = 1, mp_mesh_k = .false., the EPW run completes successfully, but the SnSe.epmatwp file size is 33.37 GB.
When the q-point grid is reduced from 4x4x4 to 3x3x3:
With the settings: etf_mem = 0, mp_mesh_k = .false., epw_memdist = .true., the EPW run is successful, and the SnSe.epmatwp file size is 3.39 GB.
With the settings: etf_mem = 1, mp_mesh_k = .false., the EPW run is successful, and the SnSe.epmatwp file size is 3.39 GB.
For the 3x3x3 q-point grid, both settings work successfully, resulting in a SnSe.epmatwp file size of 3.39 GB. For the 4x4x4 q-point grid, only the settings etf_mem = 1, mp_mesh_k = .false. lead to a successful run, resulting in a SnSe.epmatwp file size of 33.37 GB, while the settings etf_mem = 0, mp_mesh_k = .false., epw_memdist = .true. cause the run to fail. Therefore, I suspect that the SnSe.epmatwp file might be too large, leading to the EPW runtime error.
The epmatwp file is too large, causing an EPW runtime error
Moderator: stiwari
-
guodonglin
- Posts: 29
- Joined: Mon Mar 29, 2021 5:56 am
- Affiliation: CQU
The epmatwp file is too large, causing an EPW runtime error
- Attachments
-
- in-out put.zip
- (20.5 KiB) Downloaded 1047 times
Re: The epmatwp file is too large, causing an EPW runtime error
Hi ,
This is not a bug. With a 4×4×4 q-grid, the number of q-points is 64, and the corresponding k-point mesh is 8×8×8. The size of epmatwp is given by:
nbndsub × nbndsub × nrr_k × nmodes × nrr_g
where nbndsub is the number of Wannier projections, nrr_k is proportional to the coarse k-grid, and nrr_g is proportional to the coarse q-grid.
In your case, this results in epmatwp being 33.37 GB. When etf_mem = 0, this matrix is kept in memory on each node, causing the code to crash due to insufficient memory. You can either request more memory per job (e.g., more nodes while keeping the same number of processors) or switch to etf_mem = 1.
Using epwmem_dist = .true. helps, but it only applies after the Wannier-to-Bloch transformation. Memory distribution for the Bloch-to-Wannier step has not yet been implemented.
When you used a 3×3×3 q-grid (likely with a 6×6×6 k-grid), epmatwp was 3.39 GB, which fits within the available memory, so etf_mem = 0 worked fine in that case.
I hope this clarifies the issue.
Regards,
Shashi
This is not a bug. With a 4×4×4 q-grid, the number of q-points is 64, and the corresponding k-point mesh is 8×8×8. The size of epmatwp is given by:
nbndsub × nbndsub × nrr_k × nmodes × nrr_g
where nbndsub is the number of Wannier projections, nrr_k is proportional to the coarse k-grid, and nrr_g is proportional to the coarse q-grid.
In your case, this results in epmatwp being 33.37 GB. When etf_mem = 0, this matrix is kept in memory on each node, causing the code to crash due to insufficient memory. You can either request more memory per job (e.g., more nodes while keeping the same number of processors) or switch to etf_mem = 1.
Using epwmem_dist = .true. helps, but it only applies after the Wannier-to-Bloch transformation. Memory distribution for the Bloch-to-Wannier step has not yet been implemented.
When you used a 3×3×3 q-grid (likely with a 6×6×6 k-grid), epmatwp was 3.39 GB, which fits within the available memory, so etf_mem = 0 worked fine in that case.
I hope this clarifies the issue.
Regards,
Shashi