GaAs runtime reduction

Vahid · Post by **Vahid** » Thu Jun 30, 2016 10:32 am

Dear EPW Users,

I have tested EPW-5.4.0 output on diamond example files against those provided in the software package and I get comparable runtime so the following is likely not installation related.

I have attempted to calculate electron self-energy of GaAs using the following input:

Code: Select all

--
&inputepw
  prefix      = 'gaas'
  amass(1) = 69.723
  amass(2) = 74.92160
  outdir      = './'

  iverbosity  = 0

  elph        = .true.
  epbwrite    = .true.
  epbread     = .false.
  lpolar      = .true.
  !etf_mem     = .false.

  epwwrite    =  .true.
  epwread     =  .false.

  nbndsub     =  8
  nbndskip    =  0

  wannierize  = .true.
  num_iter    = 500
  iprint      = 2
  dis_win_max = 15
  dis_froz_max= 4.7
  proj(1)     = 'Ga:s;px;py;pz'
  proj(2)     = 'As:s;px;py;pz'   

  elinterp    = .true.
  phinterp    = .true.

  tshuffle2   = .true.
  tphases     = .false.

  elecselfen  = .true. 
  phonselfen  = .false. 
  a2f         = .false.

  parallel_k  = .true.
  parallel_q  = .false.

  fsthick     = 2 ! eV 
  eptemp      = 300 ! K 
  degaussw    = 0.01 ! eV

  dvscf_dir   = '../phonons/save'
  filukk      = './gaas.ukk'

  nkf1        = 40
  nkf2        = 40
  nkf3        = 40

  nqf1        = 80
  nqf2        = 80
  nqf3        = 80
  
  nk1         = 8
  nk2         = 8
  nk3         = 8

  nq1         = 8
  nq2         = 8
  nq3         = 8
 /
       29 cartesian
         0.000000000   0.000000000   0.000000000   0.0039062
        -0.125000000   0.125000000  -0.125000000   0.0312500
        -0.250000000   0.250000000  -0.250000000   0.0312500
        -0.375000000   0.375000000  -0.375000000   0.0312500
         0.500000000  -0.500000000   0.500000000   0.0156250
         0.000000000   0.250000000   0.000000000   0.0234375
        -0.125000000   0.375000000  -0.125000000   0.0937500
        -0.250000000   0.500000000  -0.250000000   0.0937500
         0.625000000  -0.375000000   0.625000000   0.0937500
         0.500000000  -0.250000000   0.500000000   0.0937500
         0.375000000  -0.125000000   0.375000000   0.0937500 
         0.250000000   0.000000000   0.250000000   0.0468750
         0.000000000   0.500000000   0.000000000   0.0234375
        -0.125000000   0.625000000  -0.125000000   0.0937500
         0.750000000  -0.250000000   0.750000000   0.0937500
         0.625000000  -0.125000000   0.625000000   0.0937500
         0.500000000   0.000000000   0.500000000   0.0468750
         0.000000000   0.750000000   0.000000000   0.0234375
         0.875000000  -0.125000000   0.875000000   0.0937500
         0.750000000   0.000000000   0.750000000   0.0468750
         0.000000000  -1.000000000   0.000000000   0.0117188
        -0.250000000   0.500000000   0.000000000   0.0937500
         0.625000000  -0.375000000   0.875000000   0.1875000
         0.500000000  -0.250000000   0.750000000   0.0937500
         0.750000000  -0.250000000   1.000000000   0.0937500
         0.625000000  -0.125000000   0.875000000   0.1875000
         0.500000000   0.000000000   0.750000000   0.0937500 
        -0.250000000  -1.000000000   0.000000000   0.0468750
        -0.500000000  -1.000000000   0.000000000   0.0234375

The grids were taken from the PNAS,112,5291,2015 paper on GaAs. These fine grids are even less stringent that 100x100x100 for both q- and k-grid reported recently (arXiv:1606.07074).

After 240 hours (maximum time allowed on our cluster) on 64 cpus, there was no result as EPW was still calculating. I tried a 30x30x30 and 60x60x60 k- and q-grids, respectively and that finished in 83 hours on 64 cpus but these grids are insufficient for convergence. I have tried the followings to reduce runtime:

1. Use fsthick=2eV as suggested by Professor Giustino (viewtopic.php?f=3&t=18) to speed up calculation

2. Use etf_mem=.false as nothing seems to happen for over a week. According to the manual, etf_mem=.true. is faster although the user can see the progress in the output in terms of the q-points done. For the 240 hour run, I had set etf_mem=.true.

3. Parallelize over q instead of k. The recent EPW paper suggests q-parallelization is possible but even the diamond example segfaults on 8 cpus with 24Gb of memory. It runs fine with k-parallelization.

I was wondering if these is something else I can do to reduce the runtime.

Thank you,

Vahid

Vahid Askarpour
Department of Physics and Atmospheric Science
Dalhousie University,
Halifax, NS, Canada

Post by **carla.verdi** » Thu Jun 30, 2016 10:52 am

Dear Vahid

Unfortunately there is nothing else you can do to reduce the runtime apart from increasing the number of CPUs. This is because you are calculating dense grids both in k and q. If you can't increase the number of CPUs or if the runtime is still too long, a possible suggestion is for example to split your k-grid and calculate the self-energy for different subsets of k points provided in filkf.
If you want to monitor the progress in terms of the q-points done, you can modify line 448 in ephwann_shuffle.f90 and delete ".not. etf_mem" in the IF. We will include this change in the next release as it is sensible to monitor the progress also when etf_mem is true.

Best
Carla

Vahid · Post by **Vahid** » Thu Jun 30, 2016 7:27 pm

Dear Carla,

Thank you for your valuable suggestions. I may have to break up the k-grid as you suggested.

BTW, is the parallel-q on EPW-5.4.0 functional? I get segmentation fault for the diamond example on 8 cpus with 6GB per cpu. This is where the code stops:

Code: Select all

     Finished reading Wann rep data from file

      Using k-mesh file: meshes/path.dat
     Size of k point mesh for interpolation:        402
     Using uniform q-mesh:   50  50  50
     Size of q point mesh for interpolation:     125000
     Max number of q points per pool:            15625

     Fermi energy coarse grid =  12.903848 eV

     ===================================================================

     Fermi energy is read from the input file: Ef =  13.209862 eV

     ===================================================================

              ibndmin =     2  ebndmin =     0.872
              ibndmax =     4  ebndmax =     1.014


     Number of ep-matrix elements per pool :        10854 ~=   84.80 Kb (@ 8 bytes/ DP)

And this is my input file:

Code: Select all

--
&inputepw
  prefix      = 'diam'
  amass(1)    = 12.01078
  outdir      = './'

  iverbosity  = 0

  elph        = .true.
  epbwrite    = .true.
  epbread     = .false.

  epwwrite    =  .true.
  epwread     =  .false.

  efermi_read = .true. 
  fermi_energy = 13.209862

  nbndsub     =  4
  nbndskip    =  0

  wannierize  = .true.
  num_iter    = 300
  iprint      = 2
  dis_win_max = 12
  dis_froz_max= 7
  proj(1)     = 'f=0,0,0:l=-3'   

  elinterp    = .true.
  phinterp    = .true.

  tshuffle2   = .true.
  tphases     = .false.

  elecselfen  = .true. 
  phonselfen  = .fasle. 
  a2f         = .false.

  parallel_k  = .false.
  parallel_q  = .true.

  fsthick     = 1.36056981 ! eV 
  eptemp      = 300 ! K (same as PRB 76, 165108)
  degaussw    = 0.1 ! eV

  dvscf_dir   = '../phonons/save'
  filukk      = './diam.ukk'
  filkf       = 'meshes/path.dat'
  nqf1        = 50
  nqf2        = 50
  nqf3        = 50
  
  nk1         = 6
  nk2         = 6
  nk3         = 6

  nq1         = 6
  nq2         = 6
  nq3         = 6
 /
      16 cartesian
   0.0000000   0.0000000   0.0000000  0.0092593
  -0.1666667   0.1666667  -0.1666667  0.0740741
  -0.3333333   0.3333333  -0.3333333  0.0740741
   0.5000000  -0.5000000   0.5000000  0.0370370
   0.0000000   0.3333333   0.0000000  0.0555556
  -0.1666667   0.5000000  -0.1666667  0.2222222
   0.6666667  -0.3333333   0.6666667  0.2222222
   0.5000000  -0.1666667   0.5000000  0.2222222
   0.3333333   0.0000000   0.3333333  0.1111111
   0.0000000   0.6666667   0.0000000  0.0555556
   0.8333333  -0.1666667   0.8333333  0.2222222
   0.6666667   0.0000000   0.6666667  0.1111111
   0.0000000  -1.0000000   0.0000000  0.0277778
   0.6666667  -0.3333333   1.0000000  0.2222222
   0.5000000  -0.1666667   0.8333333  0.2222222
  -0.3333333  -1.0000000   0.0000000  0.1111111

Gratefully,
Vahid

Post by **sponce** » Fri Jul 01, 2016 9:38 am

Hello Vahid,

The q-point parallelisation should be functional (I've now added those in the tests for the new release in September).

However, at this point, I do not recommend to use q-paral as it is much slower than k-paral.

This is because I've spend time optimizing the k-para but not yet the q-para.

Best,

Samuel

EPW Forum

GaAs runtime reduction

GaAs runtime reduction

Re: GaAs runtime reduction

Re: GaAs runtime reduction

Re: GaAs runtime reduction