Page 1 of 1

How to properly restart calculation?

Posted: Tue May 07, 2024 4:25 pm
by rkarkee
Hi EPW users and developers,

I am trying to compute following epw.x calculation but fails due to time limit. I have been trying to restart the calculation but it does everything the same (wannierization steps and all and stops at the same point). I guess I have not been able to properly restart the calculation. Can you please suggest me how can I restart from last results (skipping wannierization and previously computed steps).

I tried with restart=.true. flag in the input but somehow it did not restart calculation. It ends after time limit as below, which was the same case without restart flag. Also please see below my input file.

Inside velocity step 1


Velocity matrix elements calculated


Bloch2wane: 1 / 32
Bloch2wane: 2 / 32
Bloch2wane: 3 / 32
Bloch2wane: 4 / 32
Bloch2wane: 5 / 32
Bloch2wane: 6 / 32
Bloch2wane: 7 / 32
Bloch2wane: 8 / 32
Bloch2wane: 9 / 32
Bloch2wane: 10 / 32
Bloch2wane: 11 / 32
Bloch2wane: 12 / 32
Bloch2wane: 13 / 32
Bloch2wane: 14 / 32
Bloch2wane: 15 / 32
Bloch2wane: 16 / 32
Bloch2wane: 17 / 32
Bloch2wane: 18 / 32
Bloch2wane: 19 / 32
Bloch2wane: 20 / 32
Bloch2wane: 21 / 32
Bloch2wane: 22 / 32
Bloch2wane: 23 / 32
Bloch2wane: 24 / 32
Bloch2wane: 25 / 32
Bloch2wane: 26 / 32
Bloch2wane: 27 / 32
Bloch2wane: 28 / 32
Bloch2wane: 29 / 32
Bloch2wane: 30 / 32
Bloch2wane: 31 / 32
Bloch2wane: 32 / 32
Bloch2wanp: 1 / 1

Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file

===================================================================
Memory usage: VmHWM = 1665Mb
VmPeak = 3380Mb
===================================================================

Using uniform q-mesh: 16 16 8
Size of q point mesh for interpolation: 2048
Using uniform k-mesh: 16 16 8
Size of k point mesh for interpolation: 4096
Max number of k points per pool: 16

Fermi energy coarse grid = 9.928846 eV

Skipping the first 70 bands:

The Fermi level will be determined with 44.00000 electrons

Fermi energy is calculated from the fine k-mesh: Ef = 10.001024 eV

===================================================================

ibndmin = 21 ebndmin = 9.601 eV
ibndmax = 24 ebndmax = 10.401 eV


Number of ep-matrix elements per pool : 4608 ~= 36.00 Kb (@ 8 bytes/ DP)

A selecq.fmt file was found but re-created because selecqread == .FALSE.
Number selected, total 100 100
Number selected, total 200 200
Number selected, total 300 300
Number selected, total 400 400
Number selected, total 500 500
Number selected, total 600 600
Number selected, total 700 700
Number selected, total 800 800
Number selected, total 900 900
Number selected, total 1000 1000
Number selected, total 1100 1100
Number selected, total 1200 1200
Number selected, total 1300 1300
Number selected, total 1400 1400
Number selected, total 1500 1500
Number selected, total 1600 1600
Number selected, total 1700 1700
Number selected, total 1800 1800
Number selected, total 1900 1900
Number selected, total 2000 2000
We only need to compute 2048 q-points

Progression iq (fine) = 100/ 2048
Progression iq (fine) = 200/ 2048
Progression iq (fine) = 300/ 2048
Progression iq (fine) = 400/ 2048
Progression iq (fine) = 500/ 2048
Progression iq (fine) = 600/ 2048
Progression iq (fine) = 700/ 2048
Progression iq (fine) = 800/ 2048
srun: error: nid001888: tasks 0-255: Power failure
epw1.out lines 788-889/890 100%







&inputepw

restart=.true.
prefix = 'hfte5'
outdir = './'
dvscf_dir = './save'

elph = .true.
epwwrite = .true.
epwread = .false.
lpolar = .true.

wannierize = .true.
proj(1) = 'Hf:d'
proj(2) = 'Te:p'
vme='wannier'
nbndsub = 40
bands_skipped='exclude_bands = 1-70, 111-160'
num_iter = 50000
iprint=2

dis_win_max=15
dis_froz_max=13
dis_froz_min = 3
dis_win_min = 3


wdata(1)='bands_plot = .true.'
wdata(2) = 'begin kpoint_path'
wdata(3) = 'X 0.25122543 -0.25122543 0 G 0 0 0'
wdata(4)= 'G 0 0 0 Y 0.24908686 0.24908686 0'
wdata(5)='Y 0.24908686 0.24908686 0 S 0.5003123 -0.00213862 0'
wdata(6)='S 0.5003123 -0.00213862 0 R 0.5003123 -0.00213862 0.49836313'
wdata(7)='R 0.5003123 -0.00213862 0.49836313 Z 0 0 0.49836313'
wdata(8)='Z 0 0 0.49836313 G 0 0 0'
wdata(9)='end kpoint_path'
wdata(10)='bands_plot_format = gnuplot'
wdata(11)='use_ws_distance =T'

fsthick=0.4

band_plot= .false.

!filqf = './XGYSRZ.dat'
!filkf = './XGYSRZ.dat'

nkf1 = 16
nkf2 = 16
nkf3 = 8
nqf1 = 16
nqf2 = 16
nqf3 = 8
nk1 = 8
nk2 = 8
nk3 = 4
nq1 = 4
nq2 = 4
nq3 = 2
/



Best
Rijan

Re: How to properly restart calculation?

Posted: Wed May 08, 2024 5:42 pm
by hmori
Dear Rijan,

The restart flag is only effective when flags such as specfun_el or scattering are set to true. In other words, if none of them are true, no temporary files for restarting will be generated. Upon reviewing your input file, you set none of these flags, so the restart flag is ignored. Additionally, no flags related to transport properties or superconductivity are set either, meaning that even if the calculation for 2048 q points is completed, there's a high possibility that nothing will be outputted. If there are specific calculations you wish to run or if you're referring to any tutorials, please let me know.

BTW, epwwrite is a flag used when outputting 'epwdata.fmt' and 'prefix.epmatwp'. Since you received the message "Writing Hamiltonian, Dynamical matrix and EP vertex in Wann rep to file", I assume these files have already been generated. These files contain the Hamiltonian and electron-phonon interactions in the Wannier representation, which are used for the subsequent interpolation into the Bloch space. For the next calculation, setting 'wannierize=false', 'epwwrite=false', and 'epwread=true' will allow you to use the generated 'epwdata.fmt' and 'prefix.epmatwp'. On the other hand, the restart flag is used to resume the calculation from the last computed q-point in case the interpolation for 2048 q points is interrupted. And again, this flag is only effective when certain other flags are set to true.

Best regards,
Hitoshi

Re: How to properly restart calculation?

Posted: Sat May 11, 2024 4:38 pm
by rkarkee
Hi Hitoshi,

I am following tutorial and as in the tutorials, I have transport properties in epw2.in. This is my epw1.in (without transport calculations) and I needed to restart this step.
I simply did wannierize = .false. in my previous input calculation (skipped wannierization as it was done already).

It ended calculation (but did no say like "JOB DONE" which we usually see in QE). I am putting last part of epw1.out, is it done completely?

===================================================================
Memory usage: VmHWM = 1452Mb
VmPeak = 3171Mb
===================================================================


Unfolding on the coarse grid
elphon_wrap : 27675.00s CPU 29169.86s WALL ( 1 calls)

INITIALIZATION:

set_drhoc : 0.51s CPU 0.53s WALL ( 33 calls)
init_vloc : 0.03s CPU 0.04s WALL ( 1 calls)
init_us_1 : 0.06s CPU 0.06s WALL ( 1 calls)



Electron-Phonon interpolation
ephwann : 11351.20s CPU 14248.58s WALL ( 1 calls)
ep-interp : 11168.31s CPU 13953.89s WALL ( 2048 calls)

Ham: step 1 : 0.00s CPU 0.00s WALL ( 1 calls)
Ham: step 2 : 0.10s CPU 0.19s WALL ( 1 calls)
ep: step 1 : 0.19s CPU 0.19s WALL ( 32 calls)
ep: step 2 : 111.28s CPU 147.36s WALL ( 32 calls)
DynW2B : 228.97s CPU 477.43s WALL ( 2048 calls)
HamW2B : 264.15s CPU 268.78s WALL ( 32792 calls)
ephW2Bp : 7798.78s CPU 10303.30s WALL ( 2048 calls)
ephW2B : 1651.61s CPU 1666.88s WALL ( 8330 calls)
vmewan2bloch : 969.19s CPU 979.99s WALL ( 32768 calls)
vmewan2bloch : 969.19s CPU 979.99s WALL ( 32768 calls)


Total program execution
EPW : 10h50m CPU 12h 3m WALL

% Copyright (C) 2016-2023 EPW-Collaboration

===============================================================================
Please consider citing the following papers.

% Paper describing the method on which EPW relies
F. Giustino and M. L. Cohen and S. G. Louie, Phys. Rev. B 76, 165108 (2007)

% Papers describing the EPW software
H. Lee et al., npj Comput. Mater. 9, 156 (2023)
S. Ponc\'e, E.R. Margine, C. Verdi and F. Giustino, Comput. Phys. Commun. 209, 116 (2016)
J. Noffsinger et al., Comput. Phys. Commun. 181, 2140 (2010)


% Since you used the [lpolar] input, please consider also citing
C. Verdi and F. Giustino, Phys. Rev. Lett. 115, 176401 (2015)

For your convenience, this information is also reported in the
functionality-dependent EPW.bib file.
===============================================================================

Re: How to properly restart calculation?

Posted: Sun May 12, 2024 4:24 am
by hmori
Hi Rijan,

The format of the footer in stdout has changed from the previous versions. In recent versions, the message 'JOB DONE' is not outputted. As you can see from the calculation times output for each process, your calculation of epw1.in has completed successfully.

So, I guess your problem with the restart has been resolved. If you encounter any issues with the transport properties calculations, please feel free to post.

Best,
Hitoshi