Hi EPW users  and developers,
I am trying to compute following epw.x calculation but fails due to time limit. I have been trying to restart the calculation but it does everything the same (wannierization steps and all and stops at the same point).  I guess I have not been able to properly restart the calculation. Can you please suggest me how can I restart from last results (skipping wannierization and previously computed steps).
I tried with restart=.true. flag in the input but somehow it did not restart calculation. It ends after time limit as below, which was the same case without restart flag. Also please see below my input file.
     Inside velocity step 1
     Velocity matrix elements calculated
     Bloch2wane:          1 /         32
     Bloch2wane:          2 /         32
     Bloch2wane:          3 /         32
     Bloch2wane:          4 /         32
     Bloch2wane:          5 /         32
     Bloch2wane:          6 /         32
     Bloch2wane:          7 /         32
     Bloch2wane:          8 /         32
     Bloch2wane:          9 /         32
     Bloch2wane:         10 /         32
     Bloch2wane:         11 /         32
     Bloch2wane:         12 /         32
     Bloch2wane:         13 /         32
     Bloch2wane:         14 /         32
     Bloch2wane:         15 /         32
     Bloch2wane:         16 /         32
     Bloch2wane:         17 /         32
     Bloch2wane:         18 /         32
     Bloch2wane:         19 /         32
     Bloch2wane:         20 /         32
     Bloch2wane:         21 /         32
     Bloch2wane:         22 /         32
     Bloch2wane:         23 /         32
     Bloch2wane:         24 /         32
     Bloch2wane:         25 /         32
     Bloch2wane:         26 /         32
     Bloch2wane:         27 /         32
     Bloch2wane:         28 /         32
     Bloch2wane:         29 /         32
     Bloch2wane:         30 /         32
     Bloch2wane:         31 /         32
     Bloch2wane:         32 /         32
     Bloch2wanp:          1 /          1
     Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file
     ===================================================================
     Memory usage:  VmHWM =      1665Mb
                   VmPeak =      3380Mb
     ===================================================================
     Using uniform q-mesh:    16   16    8
     Size of q point mesh for interpolation:       2048
     Using uniform k-mesh:    16   16    8
     Size of k point mesh for interpolation:       4096
     Max number of k points per pool:               16
     Fermi energy coarse grid =   9.928846 eV
     Skipping the first   70 bands:
     The Fermi level will be determined with  44.00000 electrons
     Fermi energy is calculated from the fine k-mesh: Ef =  10.001024 eV
     ===================================================================
              ibndmin =    21  ebndmin =     9.601 eV
              ibndmax =    24  ebndmax =    10.401 eV
     Number of ep-matrix elements per pool :         4608 ~=   36.00 Kb (@ 8 bytes/ DP)
     A selecq.fmt file was found but re-created because selecqread == .FALSE.
     Number selected, total            100            100
     Number selected, total            200            200
     Number selected, total            300            300
     Number selected, total            400            400
     Number selected, total            500            500
     Number selected, total            600            600
     Number selected, total            700            700
     Number selected, total            800            800
     Number selected, total            900            900
     Number selected, total           1000           1000
     Number selected, total           1100           1100
     Number selected, total           1200           1200
     Number selected, total           1300           1300
     Number selected, total           1400           1400
     Number selected, total           1500           1500
     Number selected, total           1600           1600
     Number selected, total           1700           1700
     Number selected, total           1800           1800
     Number selected, total           1900           1900
     Number selected, total           2000           2000
     We only need to compute     2048 q-points
     Progression iq (fine) =        100/      2048
     Progression iq (fine) =        200/      2048
     Progression iq (fine) =        300/      2048
     Progression iq (fine) =        400/      2048
     Progression iq (fine) =        500/      2048
     Progression iq (fine) =        600/      2048
     Progression iq (fine) =        700/      2048
     Progression iq (fine) =        800/      2048
srun: error: nid001888: tasks 0-255: Power failure
epw1.out lines 788-889/890 100%
&inputepw
  restart=.true.
  prefix      = 'hfte5'
  outdir      = './'
  dvscf_dir   = './save'
  elph        = .true.
  epwwrite    = .true.
  epwread     = .false.
  lpolar      = .true.
  wannierize  = .true.
  proj(1)     = 'Hf:d'
  proj(2)     = 'Te:p'
  vme='wannier'
nbndsub = 40
bands_skipped='exclude_bands = 1-70, 111-160'
num_iter = 50000
iprint=2
dis_win_max=15
dis_froz_max=13
dis_froz_min = 3
dis_win_min = 3
wdata(1)='bands_plot = .true.'
wdata(2) = 'begin kpoint_path'
wdata(3) = 'X 0.25122543 -0.25122543 0 G 0 0 0'
wdata(4)= 'G 0 0 0 Y 0.24908686 0.24908686 0'
wdata(5)='Y 0.24908686 0.24908686 0 S 0.5003123 -0.00213862 0'
wdata(6)='S 0.5003123 -0.00213862 0 R 0.5003123 -0.00213862 0.49836313'
wdata(7)='R 0.5003123 -0.00213862 0.49836313 Z 0 0 0.49836313'
wdata(8)='Z 0 0 0.49836313 G 0 0 0'
wdata(9)='end kpoint_path'
wdata(10)='bands_plot_format = gnuplot'
wdata(11)='use_ws_distance =T'
fsthick=0.4
band_plot= .false.
!filqf       = './XGYSRZ.dat'
!filkf       = './XGYSRZ.dat'
nkf1        = 16
nkf2        = 16
nkf3        = 8
nqf1        = 16
nqf2        = 16
nqf3        = 8
nk1         = 8
nk2         = 8
nk3         = 4
nq1         = 4
nq2         = 4
nq3         = 2
/
Best
Rijan
			
			
									
						
										
						How to properly restart calculation?
Moderator: stiwari
Re: How to properly restart calculation?
Dear Rijan,
The restart flag is only effective when flags such as specfun_el or scattering are set to true. In other words, if none of them are true, no temporary files for restarting will be generated. Upon reviewing your input file, you set none of these flags, so the restart flag is ignored. Additionally, no flags related to transport properties or superconductivity are set either, meaning that even if the calculation for 2048 q points is completed, there's a high possibility that nothing will be outputted. If there are specific calculations you wish to run or if you're referring to any tutorials, please let me know.
BTW, epwwrite is a flag used when outputting 'epwdata.fmt' and 'prefix.epmatwp'. Since you received the message "Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file", I assume these files have already been generated. These files contain the Hamiltonian and electron-phonon interactions in the Wannier representation, which are used for the subsequent interpolation into the Bloch space. For the next calculation, setting 'wannierize=false', 'epwwrite=false', and 'epwread=true' will allow you to use the generated 'epwdata.fmt' and 'prefix.epmatwp'. On the other hand, the restart flag is used to resume the calculation from the last computed q-point in case the interpolation for 2048 q points is interrupted. And again, this flag is only effective when certain other flags are set to true.
Best regards,
Hitoshi
			
			
									
						
										
						The restart flag is only effective when flags such as specfun_el or scattering are set to true. In other words, if none of them are true, no temporary files for restarting will be generated. Upon reviewing your input file, you set none of these flags, so the restart flag is ignored. Additionally, no flags related to transport properties or superconductivity are set either, meaning that even if the calculation for 2048 q points is completed, there's a high possibility that nothing will be outputted. If there are specific calculations you wish to run or if you're referring to any tutorials, please let me know.
BTW, epwwrite is a flag used when outputting 'epwdata.fmt' and 'prefix.epmatwp'. Since you received the message "Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file", I assume these files have already been generated. These files contain the Hamiltonian and electron-phonon interactions in the Wannier representation, which are used for the subsequent interpolation into the Bloch space. For the next calculation, setting 'wannierize=false', 'epwwrite=false', and 'epwread=true' will allow you to use the generated 'epwdata.fmt' and 'prefix.epmatwp'. On the other hand, the restart flag is used to resume the calculation from the last computed q-point in case the interpolation for 2048 q points is interrupted. And again, this flag is only effective when certain other flags are set to true.
Best regards,
Hitoshi
Re: How to properly restart calculation?
Hi Hitoshi,
I am following tutorial and as in the tutorials, I have transport properties in epw2.in. This is my epw1.in (without transport calculations) and I needed to restart this step.
I simply did wannierize = .false. in my previous input calculation (skipped wannierization as it was done already).
It ended calculation (but did no say like "JOB DONE" which we usually see in QE). I am putting last part of epw1.out, is it done completely?
===================================================================
Memory usage: VmHWM = 1452Mb
VmPeak = 3171Mb
===================================================================
Unfolding on the coarse grid
elphon_wrap : 27675.00s CPU 29169.86s WALL ( 1 calls)
INITIALIZATION:
set_drhoc : 0.51s CPU 0.53s WALL ( 33 calls)
init_vloc : 0.03s CPU 0.04s WALL ( 1 calls)
init_us_1 : 0.06s CPU 0.06s WALL ( 1 calls)
Electron-Phonon interpolation
ephwann : 11351.20s CPU 14248.58s WALL ( 1 calls)
ep-interp : 11168.31s CPU 13953.89s WALL ( 2048 calls)
Ham: step 1 : 0.00s CPU 0.00s WALL ( 1 calls)
Ham: step 2 : 0.10s CPU 0.19s WALL ( 1 calls)
ep: step 1 : 0.19s CPU 0.19s WALL ( 32 calls)
ep: step 2 : 111.28s CPU 147.36s WALL ( 32 calls)
DynW2B : 228.97s CPU 477.43s WALL ( 2048 calls)
HamW2B : 264.15s CPU 268.78s WALL ( 32792 calls)
ephW2Bp : 7798.78s CPU 10303.30s WALL ( 2048 calls)
ephW2B : 1651.61s CPU 1666.88s WALL ( 8330 calls)
vmewan2bloch : 969.19s CPU 979.99s WALL ( 32768 calls)
vmewan2bloch : 969.19s CPU 979.99s WALL ( 32768 calls)
Total program execution
EPW : 10h50m CPU 12h 3m WALL
% Copyright (C) 2016-2023 EPW-Collaboration
===============================================================================
Please consider citing the following papers.
% Paper describing the method on which EPW relies
F. Giustino and M. L. Cohen and S. G. Louie, Phys. Rev. B 76, 165108 (2007)
% Papers describing the EPW software
H. Lee et al., npj Comput. Mater. 9, 156 (2023)
S. Ponc\'e, E.R. Margine, C. Verdi and F. Giustino, Comput. Phys. Commun. 209, 116 (2016)
J. Noffsinger et al., Comput. Phys. Commun. 181, 2140 (2010)
% Since you used the [lpolar] input, please consider also citing
C. Verdi and F. Giustino, Phys. Rev. Lett. 115, 176401 (2015)
For your convenience, this information is also reported in the
functionality-dependent EPW.bib file.
===============================================================================
			
			
									
						
										
						I am following tutorial and as in the tutorials, I have transport properties in epw2.in. This is my epw1.in (without transport calculations) and I needed to restart this step.
I simply did wannierize = .false. in my previous input calculation (skipped wannierization as it was done already).
It ended calculation (but did no say like "JOB DONE" which we usually see in QE). I am putting last part of epw1.out, is it done completely?
===================================================================
Memory usage: VmHWM = 1452Mb
VmPeak = 3171Mb
===================================================================
Unfolding on the coarse grid
elphon_wrap : 27675.00s CPU 29169.86s WALL ( 1 calls)
INITIALIZATION:
set_drhoc : 0.51s CPU 0.53s WALL ( 33 calls)
init_vloc : 0.03s CPU 0.04s WALL ( 1 calls)
init_us_1 : 0.06s CPU 0.06s WALL ( 1 calls)
Electron-Phonon interpolation
ephwann : 11351.20s CPU 14248.58s WALL ( 1 calls)
ep-interp : 11168.31s CPU 13953.89s WALL ( 2048 calls)
Ham: step 1 : 0.00s CPU 0.00s WALL ( 1 calls)
Ham: step 2 : 0.10s CPU 0.19s WALL ( 1 calls)
ep: step 1 : 0.19s CPU 0.19s WALL ( 32 calls)
ep: step 2 : 111.28s CPU 147.36s WALL ( 32 calls)
DynW2B : 228.97s CPU 477.43s WALL ( 2048 calls)
HamW2B : 264.15s CPU 268.78s WALL ( 32792 calls)
ephW2Bp : 7798.78s CPU 10303.30s WALL ( 2048 calls)
ephW2B : 1651.61s CPU 1666.88s WALL ( 8330 calls)
vmewan2bloch : 969.19s CPU 979.99s WALL ( 32768 calls)
vmewan2bloch : 969.19s CPU 979.99s WALL ( 32768 calls)
Total program execution
EPW : 10h50m CPU 12h 3m WALL
% Copyright (C) 2016-2023 EPW-Collaboration
===============================================================================
Please consider citing the following papers.
% Paper describing the method on which EPW relies
F. Giustino and M. L. Cohen and S. G. Louie, Phys. Rev. B 76, 165108 (2007)
% Papers describing the EPW software
H. Lee et al., npj Comput. Mater. 9, 156 (2023)
S. Ponc\'e, E.R. Margine, C. Verdi and F. Giustino, Comput. Phys. Commun. 209, 116 (2016)
J. Noffsinger et al., Comput. Phys. Commun. 181, 2140 (2010)
% Since you used the [lpolar] input, please consider also citing
C. Verdi and F. Giustino, Phys. Rev. Lett. 115, 176401 (2015)
For your convenience, this information is also reported in the
functionality-dependent EPW.bib file.
===============================================================================
Re: How to properly restart calculation?
Hi Rijan,
The format of the footer in stdout has changed from the previous versions. In recent versions, the message 'JOB DONE' is not outputted. As you can see from the calculation times output for each process, your calculation of epw1.in has completed successfully.
So, I guess your problem with the restart has been resolved. If you encounter any issues with the transport properties calculations, please feel free to post.
Best,
Hitoshi
			
			
									
						
										
						The format of the footer in stdout has changed from the previous versions. In recent versions, the message 'JOB DONE' is not outputted. As you can see from the calculation times output for each process, your calculation of epw1.in has completed successfully.
So, I guess your problem with the restart has been resolved. If you encounter any issues with the transport properties calculations, please feel free to post.
Best,
Hitoshi