Page 1 of 2

Why gap vs temperature is not the same as tutorial of MgB2?

Posted: Fri Sep 15, 2017 6:10 am
by balabi
Dear developers,
I tried to run EPW/examples/mgb2 according to the tutorial here http://epw.org.uk/Documentation/MgB2
Then I want to the same graph as fig.4, However, this is what I got, after change "nstemp = 1" to "nstemp = 6", see picture https://pasteboard.co/GKreH4s.png . You can the the peak shape is quite different. I want to know how to set calculating parameter, to make the plot the same as that in the tutorial.

best regards

Re: Why gap vs temperature is not the same as tutorial of Mg

Posted: Sun Sep 17, 2017 6:08 pm
by sponce
Dear balabi,

You need to push the convergence further. Converged parameter are given in the EPW paper: http://www.sciencedirect.com/science/ar ... 5516302260

See Fig. 19 for example.

Best,
Samuel

Re: Why gap vs temperature is not the same as tutorial of Mg

Posted: Mon Sep 18, 2017 4:27 pm
by balabi
sponce wrote:Dear balabi,

You need to push the convergence further. Converged parameter are given in the EPW paper: http://www.sciencedirect.com/science/ar ... 5516302260

See Fig. 19 for example.

Best,
Samuel


Dear Samuel,

Thank you so much for reply.

I am not sure if I understand "You need to push the convergence further". Do you mean that I should increase nkf1,nkf2,nkf3 and nqf1,nqf2,nqf3 ? or do I need to increase phonon q mesh or scf k mesh which is before epw run? Or do I need to increase wannier nscf mesh? Which one is more important for correct result?

According to the paper, I think maybe you mean to increase nkf and nqf.

First, I tried to increase them to nkf1=30,nkf2=30,nkf3=30 and nqf1=30,nqf2=30,nqf3=30, and the plot seems a little better ( see here https://pasteboard.co/GKXlzH1.png ). Note that I actually set T from 15K to 60K. but T=60K got convergence problem when solving anisotropic Eliashberg equations on imaginary-axis. The last iteration is
iter = 500 relerr = 9.3846423209E-02 abserr = 3.6523260131E-10 Znormi(1) = 1.8329993694E+00 Deltai(1) = 1.2461454162E-08

I got confused here. And several questions
1. Why abserr is already small, but relerr is still large? What is the relerr relative to ?
2. Does it mean that T=60K is too high for convergence. Or it is still possible to get converged if we increase iter?
3. If I only want to refine details around transition temperature, how to set epw to restart and do not do repeated works?

Second, I tried to increased them to nkf1=60,nkf2=60,nkf3=60 and nqf1=30,nqf2=30,nqf3=30 as the paper said. I found it run for hours and doesn't seem to complete. This is much much slow than previous run. So I check epw.out and found that it stuck at "Solve anisotropic Eliashberg equations on imaginary-axis " and the first temperature T=15K. For 6 hours, it only done 12 iterations. I notice that there is a sentence before iteration
Size of required memory per pool : ~= 8.4274 Gb
AKeri is calculated on the fly since its size exceedes max_memlt

I only got 64GB on one node, and I am running 16 mpi thread, that is as large as 16x8=128. So
1. What is Akeri?
2. What is "calculated on the fly" compared to "not on the fly"? Is "on the fly" the reason that cause the slowness?
3. How to estimate memory per pool before calculation?
3. any suggestions for speed up things for poor memory users ? Do I have to parallel across node? How to? Does epw perform well across node?

I really appreciate your help.

best regards

Re: Why gap vs temperature is not the same as tutorial of Mg

Posted: Tue Sep 19, 2017 1:18 am
by balabi
Dear Samuel,
I just tried to restart the calculation after aborting it at "Solve anisotropic Eliashberg equations on imaginary-axis?
I set below parameters

Code: Select all

epwread=.true.
kmaps=.true.
elph=.false.
wannierize=.false.


However, I got this error

Solve anisotropic Eliashberg equations
===================================================================


Finish reading .freq file

Fermi level (eV) = 7.4669054777E+00
DOS(states/spin/eV/Unit Cell) = 3.5407008461E-01
Electron smearing (eV) = 1.0000000000E-01
Fermi window (eV) = 4.0000000000E-01
Nr irreducible k-points within the Fermi shell = 816 out of 3234
2 bands within the Fermi window


Finish reading .egnv file


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Error in routine invmat (1):
error in DGETRF
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

stopping ...


What is wrong with it?

best regards

Re: Why gap vs temperature is not the same as tutorial of Mg

Posted: Tue Sep 19, 2017 11:32 am
by sponce
Dear balabi,

Roxana is the expert in the superconducting part of EPW. I will try to answer what I can:

- Yes you need 60x60x60 k and 30x30x30 q and yes its going to be quite expansive.
- for the absolute and relative error. You can have the difference of two very small numbers that is sizeable but their absolute value very small
- T=60K is above the T_c, therefore the code cannot solve the Eliashberg equation. The superconducting gap are 0 at that temperature.
- Note that the Tc depend on the fine grids you are using. With 30x30x30 k-grid it might not be fully converged
- AKeri is the kernel used in Eliashberg. You can try to raise max_memlt to make it fast (everything in memory). On the fly means its reading/writing data instead of having everything in memory. Off course you need to have enough memory on your cluster.
- You should definitely parallelize across nodes using MPI. EPW perform well, see scaling tests: http://epw.org.uk/Main/Benchmarks
- For the restart, I think you need to set elph to true.

Best,
Samuel

Re: Why gap vs temperature is not the same as tutorial of Mg

Posted: Tue Sep 19, 2017 2:09 pm
by balabi
sponce wrote:Dear balabi,

Roxana is the expert in the superconducting part of EPW. I will try to answer what I can:

- Yes you need 60x60x60 k and 30x30x30 q and yes its going to be quite expansive.
- for the absolute and relative error. You can have the difference of two very small numbers that is sizeable but their absolute value very small
- T=60K is above the T_c, therefore the code cannot solve the Eliashberg equation. The superconducting gap are 0 at that temperature.
- Note that the Tc depend on the fine grids you are using. With 30x30x30 k-grid it might not be fully converged
- AKeri is the kernel used in Eliashberg. You can try to raise max_memlt to make it fast (everything in memory). On the fly means its reading/writing data instead of having everything in memory. Off course you need to have enough memory on your cluster.
- You should definitely parallelize across nodes using MPI. EPW perform well, see scaling tests: http://epw.org.uk/Main/Benchmarks
- For the restart, I think you need to set elph to true.

Best,
Samuel


Dear Samuel,

Thank you so much for patient explanation.

About restart, I am quite confused. I think EPW could make restart a little easier. At the present time, there is so much switches to care about.

I now have tried several combinations of below parameters

Code: Select all

  ep_coupling = .true.
  elph        = .true.
 
  kmaps       = .false.
 
  epbwrite    = .true.
  epbread     = .false.
 
  epwwrite = .true.
  epwread  = .false.
 
  ephwrite    = .true.
 
  wannierize  = .true.


The above setting is non-restart setting.

Then I tried to modify the switches. First, I set

Code: Select all

epwwrite = .false.   epwread  = .true.  kmaps=.true.

this is according to the input doc. But this is not enough, I will got "must use same w90 rotation matrix for entire run" until I set

Code: Select all

wannierize=.false.

At this stage, the epw.out print out
------------------------------------------------------------------------
RESTART - RESTART - RESTART - RESTART
Restart is done without reading PWSCF save file.
Be aware that some consistency checks are therefore not done.
------------------------------------------------------------------------

it seems that I am on the right track. And yes it works, but with some repeated work.

First, it recalculate epb file. So I tried to set

Code: Select all

epbwrite = .false.   epbread     = .true.

in order to read epb file instead of recalculate epb file. But surprisingly, it doesn't work. It also brings error below
forrtl: severe (67): input statement requires too much data, unit 105, file /fs10/home/qhw_wang/HPC-nj/quantum_espresso/qe-dev-20170913/q-e/EPW/examples/mgb2-new/coarse_epw_grid_10/./MgB2.epb15
Image PC Routine Line Source
epw.x 0000000000F03356 Unknown Unknown Unknown
epw.x 0000000000F36A1E Unknown Unknown Unknown
epw.x 0000000000447B91 elphon_shuffle_wr 698 elphon_shuffle_wrap.f90
epw.x 0000000000407914 MAIN__ 150 epw.f90
epw.x 0000000000406C5E Unknown Unknown Unknown
libc-2.17.so 00002B2DA9553B35 __libc_start_main Unknown Unknown
epw.x 0000000000406B69 Unknown Unknown Unknown
forrtl: severe (67): input statement requires too much data, unit 105, file /fs10/home/qhw_wang/HPC-nj/quantum_espresso/qe-dev-20170913/q-e/EPW/examples/mgb2-new/coarse_epw_grid_10/./MgB2.epb1
Image PC Routine Line Source
epw.x 0000000000F03356 Unknown Unknown Unknown
.....
....


It turns out I have to set both option to false, like

Code: Select all

epbwrite = .false.   epbread     = .false.

But why?

Second, it recalculate ephmat. So I set

Code: Select all

ephwrite  = .false.

OK, now it doesn't recalculate ephmat. But I notice there is a step which is still quite time consuming for fine grid, that is
Number of ep-matrix elements per pool : 16443 ~= 128.46 Kb (@ 8 bytes/ DP)
Progression iq (fine) = 50/ 64000
Progression iq (fine) = 100/ 64000
Progression iq (fine) = 150/ 64000
Progression iq (fine) = 200/ 64000
Progression iq (fine) = 250/ 64000
...
...

I want to know what is this step? the above result is from fine grid 40x40x40. And it increases with 50 as a step, until it reaches 64000. But it took an hour only reach 49000 on a 16 core machine. Is it possible to skip this step? I feel like this is also repeated work. Am I right?

Finally, in the input doc, item "eliashberg", there is a sentence
Note: To reuse .ephmat, .freq, .egnv, .ikmap files obtained in a previous run, one needs to set ep_coupling=.false., elph=.false., and ephwrite=.false. in the input file.

But you said to make elph=.true. I don't understand what does this sentence mean.
Anyway, I tried to set below further

Code: Select all

ep_coupling=.false., elph=.false.

But I got errors
Error in routine invmat (1): error in DGETRF

Why this error?

finally, I still want to know How to estimate memory per pool before calculation, why this memory usage scale with grid so badly?

best regards

Re: Why gap vs temperature is not the same as tutorial of Mg

Posted: Wed Sep 20, 2017 11:16 am
by sponce
Hello,

A typical restart calculation (so you want to restart the interpolation part) at this point not within Eliashberg is done by doing:

Code: Select all

  elph        = .true.
  kmaps       = .true.
  epbwrite    = .false.
  epbread     = .false.
  epwwrite    = .false.
  epwread     = .true.
  wannierize  = .false.


This means you restart from the epmatwp1 file (el-ph in Wannier representation) and you just redo the interpolation.
If you have the latest EPW, you can also do (does not work with Eliashberg):

Code: Select all

  restart    = .true.
  restart_freq = 500


This allows to restart in the middle of the interpolation (if you have a lot of q-points).

Now, if you have computed the Eliashberg stuff on file, you should be also able to restart. As written in the Eliashberg input variable on the website:

To reuse .ephmat, .freq, .egnv, .ikmap files obtained
in a previous run, one needs to set ep_coupling=.false.,
elph=.false., and ephwrite=.false. in the input file.
.

So I think it should be:

Code: Select all

  elph        = .false.
  kmaps       = .true.
  epbwrite    = .false.
  epbread     = .false.
  epwwrite    = .false.
  epwread     = .true.
  wannierize  = .false.
  ephwrite     = .false.
  ep_coupling=.false.


Let me know if that works. Be sure to have the 4 files .ephmat, .freq, .egnv, .ikmap in the directory.

For the scaling. The memory scales with the number of k-points per cpu. If you keep the nb of CPU constant but go from a 30x30x30 to a 60x60x60 k-grid, then the memory will scale as 2^3 = 8 times more memory.
Therefore if you are using the etf_mem and increase your number of pools, it should decrease the memory.

Best,
Samuel

Re: Why gap vs temperature is not the same as tutorial of Mg

Posted: Wed Sep 20, 2017 1:22 pm
by hlee
Dear Samuel:

I have very minor issues and questions related to your answers above:
(1) When restarting in the step of Eliashberg stuff, you indicate that we need to set as follows:
elph = .false.
kmaps = .true.
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .true.
wannierize = .false.
ephwrite = .false.
ep_coupling=.false.

But, I am wondering whether we have all necessary data in this case.
For example, when writing ".lambda_FS" and ".lambda_*.cube" files in the subroutine of evaluate_a2f_lambda, bg is imported from cell_base; I think that if we restart in this way, we don't have bg information.

(2) In the subroutines of mem_size_eliashberg and mem_integer_size_eliashberg,
the program stops with error messages when memlt_pool is larger than max_memlt.
However, I think that it is not necessary.
For example, the subroutine of eliashberg_memlt_aniso_iaxis is implemented for the case in which memlt_pool>max_memlt; when melt_pool > max_memlt, as you said, the calculation proceeds in the "on the fly" mode.

(3) You indicated that we need to set kmaps=.true. in the restart mode.
Also, EPW stops when (epwread .AND. .not. kmaps .AND. .not. epbread) is true in the subroutine of eps_readin.
However, to the best of my knowledge, *.kmap and *.kgmap files are only necessary until calculation of e-ph matrix elements, etc in the Wannier base on the coarse grid.
I think that once we have *.fmt and *.epmatwp1, *.epmatwe1, etc, we don't need *.kmap and *.kgmap any more.
Indeed, createkmap, createkmap_pw2, readgmap subroutines are not called in the interpolation stage from coarse to fine.

Sincerely,

Hyungjun Lee

Re: Why gap vs temperature is not the same as tutorial of Mg

Posted: Thu Sep 21, 2017 9:17 am
by hlee
After some digging into the code, I realised that the item (2) in my previous post is not correct; It is necessary. The motivation for this check seems more broad.
Sorry for inconveniences.

Sincerely,

Hyungjun Lee

Re: Why gap vs temperature is not the same as tutorial of Mg

Posted: Fri Sep 22, 2017 1:56 am
by roxana
Hi,

To restart an Eliashberg calculation you will need to:

1) have .ephmat, .freq, .egnv, and .ikmap files from a previous run

2) set the following parameters in the input file

ep_coupling=.false.
elph=.false.

kmaps = .true.
epbwrite = .false.
epbread = .true.
epwwrite = .false.
epwread = .true.
wannierize = .false.

ephwrite=.false.
eliashberg = .true.

To answer Hyungjun question, it is correct that we don't need *.kmap and *.kgmap for Eliashberg calculations once the e-ph matrix elements on the fine grids are calculated. Some of the EPW stop messages such as (epwread .AND. .not. kmaps .AND. .not. epbread) should not be true for a restarted eliashberg calculation and will be updated in the near future.

Best,
Roxana