NaN in output file when setting etf_mem=.false.

Post here questions linked with issue while running the EPW code

Moderator: stiwari

Post Reply
AgentZero
Posts: 44
Joined: Tue Jul 05, 2016 8:41 am
Affiliation:

NaN in output file when setting etf_mem=.false.

Post by AgentZero »

Dear all,

Due to limited memory in calculating my compound, I set etf_mem to .false..
But I got "NaN" for all q points in the output file of EPW. For example:

ismear = 10 iq = 4095 coord.: 0.93750 0.93750 0.87500 wt: 0.00024
-------------------------------------------------------------------
lambda( 1 )= NaN gamma= NaN meV omega= 3.7006 meV
lambda( 2 )= NaN gamma= NaN meV omega= 7.5485 meV
lambda( 3 )= NaN gamma= NaN meV omega= 8.5620 meV
lambda( 4 )= NaN gamma= NaN meV omega= 37.1075 meV
lambda( 5 )= NaN gamma= NaN meV omega= 38.2527 meV
lambda( 6 )= NaN gamma= NaN meV omega= 46.6576 meV
lambda( tot )= NaN
-------------------------------------------------------------------
In my calculation, I use QE-5.4.0 and EPW-4.0.0, and I can repeat the MgB2 results with current versions of the codes.
I also noticed that in the work directory, a file ***.epmatwp1 was created (when etf_mem=.false.), but this file is empty
during the whole calculation. A calculation with small size coarse k/q meshes and etf_mem=.true. can give correct result
for the same compound. Could you give me some advice to solve the "NaN" problem? Are there other parameters should
be changed simultaneously when setting etf_mem=.false.? Or is there another way to reduce the memory used by EPW?

Thank you very much for your reply!

Miao Gao

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: NaN in output file when setting etf_mem=.false.

Post by sponce »

Dear Miao Gao,

Can you try a calculation with small size coarse k/q meshes and etf_mem=.false. to see if the problem is linked with memory.

I remember that I debug a bit that part at some point. It had to do with MPI reading the epmatwp1 file.
If that is the problem, a new version of QE/EPW should be available in September 2016.

If its urgent, I could send you the modified files but first test the above to be sure.

Best,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

AgentZero
Posts: 44
Joined: Tue Jul 05, 2016 8:41 am
Affiliation:

Re: NaN in output file when setting etf_mem=.false.

Post by AgentZero »

Dear Dr. Samuel,

Thank you very much for your reply! I have checked the NaN problem for MgB2 with etf_mem=.false..
I found if the parameter "outdir" is the current directory, the results is right without NaN in the output file.
If one sets outdir to other directory, the NaN problem in output file occur. Do you accept this explanation?


For my situation, things are still strange. I use EPW-4.0.0 to calculate electron-phonon coupling constant.
I summarized my result in the following table (see below). For case 1, the calculation of electron-phonon coupling can be
finished normally. But when I increased the coarse k/q-mesh to 8x8x8, the calculation is interrupted (case 2).
The administrators of our cluster told me this may be related to insufficient memory on the computing node.
The last few lines in the epw.out file are:

"Writing epmatq on .epb files


band disentanglement is used: nbndsub = 8

Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file


Reading Hamiltonian, Dynamical matrix and EP vertex in Wann rep from file."

I changed etf_mem from .true. to .false. to save memory, the calculation continues without above interruption, but very slow (case 3).
But I am confused now. Compared case 1 and case 2, the fine k/q-meshes are the same. And the parameter etf_mem controls whether
to store fine el-ph matrix in Bloch-space in memory. So the memory requirements in these two cases should be equal (am I right?).
Thus for case 2, the calculation should also be done normally. Why does etf_mem affect the calculation?


For further test, I reduced the fine k/q-mesh to 16x16x16, these fine meshes should not have memory problem.
But I still encountered the same interruption as that in case 2. It seems to me, the interruption is due to 8x8x8 coarse meshes.
Considering 8x8x8 mesh has only 512 points, the memory requirement by this mesh should be very limited.
So why does the interruption occur? Could you please offer me some clues to solve this problem? Thanks again!

--------------------------------------------------------------------------------------
| coarse k-mesh/q-mesh, fine k-mesh/q-mesh, etf_mem, calculation status
--------------------------------------------------------------------------------------
case 1: | 6x6x6/6x6x6, | 60x60x60/30x30x30, |.true. | right
case 2: | 8x8x8/8x8x8, | 60x60x60/30x30x30, |.true. | exit
case 3: | 8x8x8/8x8x8, | 60x60x60/30x30x30, |.false. | right but very slow
case 4: | 8x8x8/8x8x8, | 16x16x16/16x16x16, |.true. | exit
--------------------------------------------------------------------------------------

Best regards,
Miao Gao

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: NaN in output file when setting etf_mem=.false.

Post by sponce »

Dear Miao Gao,

Your conclusions are correct !

- I forgot about the outdir problem. This is solved in my development version but with version QE 5.4 there was a bug with outdir and etf_mem= false. Therefore just use outdir=' ' for the time being.

- Your table do make sense:
- If etf_mem = .true., the full coarse-grid el-ph matrix element is loaded into memory. Therefore a 8x8x8 q-point mesh will be much bigger than 6x6x6 q-point grid. Just look at the size of your "save" folder.
- If etf_mem = .false., only 1 q-point from the coarse grid is read from file and placed into memory. When that q-point is done, the next one is read etc .. This explains why it is slower and why it take much less memory.

As you can see, etf_mem has no effect on the fine k/q mesh. Increasing the fine k/q mesh will not cost more in memory but will take more time.

Hope this helps,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

AgentZero
Posts: 44
Joined: Tue Jul 05, 2016 8:41 am
Affiliation:

Re: NaN in output file when setting etf_mem=.false.

Post by AgentZero »

Dear Dr. Samuel Ponc,

Your reply is very helpful to me, thanks!

Best regards,
Miao Gao

Post Reply