GaAs runtime reduction

Post here questions linked with issue while running the EPW code

Moderator: stiwari

Post Reply
Vahid
Posts: 101
Joined: Fri Apr 08, 2016 11:02 pm
Affiliation:

GaAs runtime reduction

Post by Vahid »

Dear EPW Users,

I have tested EPW-5.4.0 output on diamond example files against those provided in the software package and I get comparable runtime so the following is likely not installation related.

I have attempted to calculate electron self-energy of GaAs using the following input:

Code: Select all

--
&inputepw
  prefix      = 'gaas'
  amass(1) = 69.723
  amass(2) = 74.92160
  outdir      = './'

  iverbosity  = 0

  elph        = .true.
  epbwrite    = .true.
  epbread     = .false.
  lpolar      = .true.
  !etf_mem     = .false.

  epwwrite    =  .true.
  epwread     =  .false.

  nbndsub     =  8
  nbndskip    =  0

  wannierize  = .true.
  num_iter    = 500
  iprint      = 2
  dis_win_max = 15
  dis_froz_max= 4.7
  proj(1)     = 'Ga:s;px;py;pz'
  proj(2)     = 'As:s;px;py;pz'   

  elinterp    = .true.
  phinterp    = .true.

  tshuffle2   = .true.
  tphases     = .false.

  elecselfen  = .true.
  phonselfen  = .false.
  a2f         = .false.

  parallel_k  = .true.
  parallel_q  = .false.

  fsthick     = 2 ! eV
  eptemp      = 300 ! K
  degaussw    = 0.01 ! eV

  dvscf_dir   = '../phonons/save'
  filukk      = './gaas.ukk'

  nkf1        = 40
  nkf2        = 40
  nkf3        = 40

  nqf1        = 80
  nqf2        = 80
  nqf3        = 80
 
  nk1         = 8
  nk2         = 8
  nk3         = 8

  nq1         = 8
  nq2         = 8
  nq3         = 8
 /
       29 cartesian
         0.000000000   0.000000000   0.000000000   0.0039062
        -0.125000000   0.125000000  -0.125000000   0.0312500
        -0.250000000   0.250000000  -0.250000000   0.0312500
        -0.375000000   0.375000000  -0.375000000   0.0312500
         0.500000000  -0.500000000   0.500000000   0.0156250
         0.000000000   0.250000000   0.000000000   0.0234375
        -0.125000000   0.375000000  -0.125000000   0.0937500
        -0.250000000   0.500000000  -0.250000000   0.0937500
         0.625000000  -0.375000000   0.625000000   0.0937500
         0.500000000  -0.250000000   0.500000000   0.0937500
         0.375000000  -0.125000000   0.375000000   0.0937500
         0.250000000   0.000000000   0.250000000   0.0468750
         0.000000000   0.500000000   0.000000000   0.0234375
        -0.125000000   0.625000000  -0.125000000   0.0937500
         0.750000000  -0.250000000   0.750000000   0.0937500
         0.625000000  -0.125000000   0.625000000   0.0937500
         0.500000000   0.000000000   0.500000000   0.0468750
         0.000000000   0.750000000   0.000000000   0.0234375
         0.875000000  -0.125000000   0.875000000   0.0937500
         0.750000000   0.000000000   0.750000000   0.0468750
         0.000000000  -1.000000000   0.000000000   0.0117188
        -0.250000000   0.500000000   0.000000000   0.0937500
         0.625000000  -0.375000000   0.875000000   0.1875000
         0.500000000  -0.250000000   0.750000000   0.0937500
         0.750000000  -0.250000000   1.000000000   0.0937500
         0.625000000  -0.125000000   0.875000000   0.1875000
         0.500000000   0.000000000   0.750000000   0.0937500
        -0.250000000  -1.000000000   0.000000000   0.0468750
        -0.500000000  -1.000000000   0.000000000   0.0234375


The grids were taken from the PNAS,112,5291,2015 paper on GaAs. These fine grids are even less stringent that 100x100x100 for both q- and k-grid reported recently (arXiv:1606.07074).

After 240 hours (maximum time allowed on our cluster) on 64 cpus, there was no result as EPW was still calculating. I tried a 30x30x30 and 60x60x60 k- and q-grids, respectively and that finished in 83 hours on 64 cpus but these grids are insufficient for convergence. I have tried the followings to reduce runtime:

1. Use fsthick=2eV as suggested by Professor Giustino (viewtopic.php?f=3&t=18) to speed up calculation

2. Use etf_mem=.false as nothing seems to happen for over a week. According to the manual, etf_mem=.true. is faster although the user can see the progress in the output in terms of the q-points done. For the 240 hour run, I had set etf_mem=.true.

3. Parallelize over q instead of k. The recent EPW paper suggests q-parallelization is possible but even the diamond example segfaults on 8 cpus with 24Gb of memory. It runs fine with k-parallelization.

I was wondering if these is something else I can do to reduce the runtime.

Thank you,

Vahid

Vahid Askarpour
Department of Physics and Atmospheric Science
Dalhousie University,
Halifax, NS, Canada
carla.verdi
Posts: 155
Joined: Thu Jan 14, 2016 10:52 am
Affiliation:

Re: GaAs runtime reduction

Post by carla.verdi »

Dear Vahid

Unfortunately there is nothing else you can do to reduce the runtime apart from increasing the number of CPUs. This is because you are calculating dense grids both in k and q. If you can't increase the number of CPUs or if the runtime is still too long, a possible suggestion is for example to split your k-grid and calculate the self-energy for different subsets of k points provided in filkf.
If you want to monitor the progress in terms of the q-points done, you can modify line 448 in ephwann_shuffle.f90 and delete ".not. etf_mem" in the IF. We will include this change in the next release as it is sensible to monitor the progress also when etf_mem is true.

Best
Carla
Vahid
Posts: 101
Joined: Fri Apr 08, 2016 11:02 pm
Affiliation:

Re: GaAs runtime reduction

Post by Vahid »

Dear Carla,

Thank you for your valuable suggestions. I may have to break up the k-grid as you suggested.

BTW, is the parallel-q on EPW-5.4.0 functional? I get segmentation fault for the diamond example on 8 cpus with 6GB per cpu. This is where the code stops:

Code: Select all

     Finished reading Wann rep data from file

      Using k-mesh file: meshes/path.dat
     Size of k point mesh for interpolation:        402
     Using uniform q-mesh:   50  50  50
     Size of q point mesh for interpolation:     125000
     Max number of q points per pool:            15625

     Fermi energy coarse grid =  12.903848 eV

     ===================================================================

     Fermi energy is read from the input file: Ef =  13.209862 eV

     ===================================================================

              ibndmin =     2  ebndmin =     0.872
              ibndmax =     4  ebndmax =     1.014


     Number of ep-matrix elements per pool :        10854 ~=   84.80 Kb (@ 8 bytes/ DP)


And this is my input file:

Code: Select all

--
&inputepw
  prefix      = 'diam'
  amass(1)    = 12.01078
  outdir      = './'

  iverbosity  = 0

  elph        = .true.
  epbwrite    = .true.
  epbread     = .false.

  epwwrite    =  .true.
  epwread     =  .false.

  efermi_read = .true.
  fermi_energy = 13.209862

  nbndsub     =  4
  nbndskip    =  0

  wannierize  = .true.
  num_iter    = 300
  iprint      = 2
  dis_win_max = 12
  dis_froz_max= 7
  proj(1)     = 'f=0,0,0:l=-3'   

  elinterp    = .true.
  phinterp    = .true.

  tshuffle2   = .true.
  tphases     = .false.

  elecselfen  = .true.
  phonselfen  = .fasle.
  a2f         = .false.

  parallel_k  = .false.
  parallel_q  = .true.

  fsthick     = 1.36056981 ! eV
  eptemp      = 300 ! K (same as PRB 76, 165108)
  degaussw    = 0.1 ! eV

  dvscf_dir   = '../phonons/save'
  filukk      = './diam.ukk'
  filkf       = 'meshes/path.dat'
  nqf1        = 50
  nqf2        = 50
  nqf3        = 50
 
  nk1         = 6
  nk2         = 6
  nk3         = 6

  nq1         = 6
  nq2         = 6
  nq3         = 6
 /
      16 cartesian
   0.0000000   0.0000000   0.0000000  0.0092593
  -0.1666667   0.1666667  -0.1666667  0.0740741
  -0.3333333   0.3333333  -0.3333333  0.0740741
   0.5000000  -0.5000000   0.5000000  0.0370370
   0.0000000   0.3333333   0.0000000  0.0555556
  -0.1666667   0.5000000  -0.1666667  0.2222222
   0.6666667  -0.3333333   0.6666667  0.2222222
   0.5000000  -0.1666667   0.5000000  0.2222222
   0.3333333   0.0000000   0.3333333  0.1111111
   0.0000000   0.6666667   0.0000000  0.0555556
   0.8333333  -0.1666667   0.8333333  0.2222222
   0.6666667   0.0000000   0.6666667  0.1111111
   0.0000000  -1.0000000   0.0000000  0.0277778
   0.6666667  -0.3333333   1.0000000  0.2222222
   0.5000000  -0.1666667   0.8333333  0.2222222
  -0.3333333  -1.0000000   0.0000000  0.1111111



Gratefully,
Vahid
sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: GaAs runtime reduction

Post by sponce »

Hello Vahid,

The q-point parallelisation should be functional (I've now added those in the tests for the new release in September).

However, at this point, I do not recommend to use q-paral as it is much slower than k-paral.

This is because I've spend time optimizing the k-para but not yet the q-para.

Best,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Post Reply