BAD TERMINATION using epw.x

Post here questions linked with issue while running the EPW code

Moderator: stiwari

Post Reply
eliephys78
Posts: 82
Joined: Thu May 05, 2016 5:18 pm
Affiliation:

BAD TERMINATION using epw.x

Post by eliephys78 »

Dear all,

I am running EPW-4.3 with QE-6.2.1. Once the "amn" files are being calculated , the code stops with the error:

"AMN calculated

MMN
k points = 576 in 512 pools
1 of 2 on ionode
2 of 2 on ionode

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 17975 RUNNING AT kcn464.local
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES"

Checking the error file I get:

"forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
epw.x 00000000010A1A21 Unknown Unknown Unknown
epw.x 00000000010A0177 Unknown Unknown Unknown
epw.x 0000000000FA3464 Unknown Unknown Unknown
epw.x 0000000000FA3276 Unknown Unknown Unknown
epw.x 0000000000F23B04 Unknown Unknown Unknown
epw.x 0000000000F2B887 Unknown Unknown Unknown
libpthread.so.0 00002ADC89925500 Unknown Unknown Unknown
libc.so.6 00002ADC89E863A7 Unknown Unknown Unknown
libmpi.so.12 00002ADC88E56034 Unknown Unknown Unknown
libmpi.so.12 00002ADC88E503E3 Unknown Unknown Unknown
libmpi.so.12 00002ADC88F61CE9 Unknown Unknown Unknown
libmpi.so.12 00002ADC88F619DF Unknown Unknown Unknown
libmpi.so.12 00002ADC88E2385C Unknown Unknown Unknown
libmpi.so.12 00002ADC88E2773D Unknown Unknown Unknown
libmpi.so.12 00002ADC88E26F6E Unknown Unknown Unknown
libmpifort.so.12 00002ADC889FEC57 Unknown Unknown Unknown
epw.x 0000000000D04526 reduce_base_real_ 303 mp_base.f90
epw.x 0000000000CF6519 mp_mp_mp_sum_c4d_ 1537 mp.f90
epw.x 00000000005059A6 compute_mmn_para_ 1125 pw2wan90epw.f90
epw.x 00000000004F2FED pw2wan90epw_ 78 pw2wan90epw.f90
epw.x 00000000004F100C wann_run_ 69 wannierize.f90
epw.x 0000000000412C6E MAIN__ 137 epw.f90
epw.x 0000000000411EDE Unknown Unknown Unknown
libc.so.6 00002ADC89DD5CDD Unknown Unknown Unknown
epw.x 0000000000411DE9 Unknown Unknown Unknown"

I suspect that this due to the number of processors used. I am using

mpirun -np 512 /......./EPW/bin/epw.x -npool 512 < epw.in > epw.out

I need such a large number to accelerate the calculations. In fact I usually define the nodes in the pbs file as:

select= ;ncpus=;mpiprocs=;ompthreads=;

How is it possible to define the variables above such that the number of processors be equal to the number of pools (by defining more than one node, that is without having to put 512 processors per single node)?

Thanks in advance

Elio
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia

eliephys78
Posts: 82
Joined: Thu May 05, 2016 5:18 pm
Affiliation:

Re: BAD TERMINATION using epw.x

Post by eliephys78 »

Just a quick update on the matter...

I performed the scf, nscf and epw calculations using np 64 and npool 64 by issuing

mpirun -np 64 /path/to/executable/pw.x -npool 64 <scf.in > scf.out

mpirun -np 64 /path/to/executable/pw.x -npool 64 <nscf.in > nscf.out

mpirun -np 64 /path/to/executable/epw.x -npool 64 <epw.in > epw.out

All went fine until the MMN calculation. the code simply froze at:

MMN
k points = 576 in 64 pools
1 of 9 on ionode
2 of 9 on ionode
3 of 9 on ionode
4 of 9 on ionode
5 of 9 on ionode
6 of 9 on ionode
7 of 9 on ionode
8 of 9 on ionode
9 of 9 on ionode
MMN calculated

Also the wout file didn't contain the Wannier spread functions which are usually obtained using the W90 code alone with wannier90.x and pw2wannier90.x executables.

Any idea what is going on .. Is it still a wrong choice of procs and npool?

Regards
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: BAD TERMINATION using epw.x

Post by sponce »

Hello,

It could be that EPW is not compiled properly.

Is the test-suite running properly ?

You can go in cd q-e/test-suite
then do

make run-custom-test-parallel testdir=epw_base

Does that work ?

If not, then you have to recompile.

Best wishes,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

eliephys78
Posts: 82
Joined: Thu May 05, 2016 5:18 pm
Affiliation:

Re: BAD TERMINATION using epw.x

Post by eliephys78 »

Dear Samuel,

Thanks for your reply. I have tried it for a system such as Al and it worked fine and it proceeded.

Anyhow it looks like it worked well when I omitted the line wdata(1)= 'exclude_bands:.....'. The bands calculated in scf are 40 whereas the Wannier functions are 11. The option exclude_bands works perfectly fine in WANNIER90 but it seems that it causes problems in EPW. I have defined the min and max frozen windows including those 11 bands only and defined nbandsub=11 omitting the exclude_band option and the calculations re running smoothly.

Regards
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia

eliephys78
Posts: 82
Joined: Thu May 05, 2016 5:18 pm
Affiliation:

Re: BAD TERMINATION using epw.x

Post by eliephys78 »

Hello again,

The Wannier calculations are successfully done . Now the code is freezing at the "kmaps" calculations:
"Calculating kmap and kgmap
Progress kmap: ########################################"
for more than one hour and 40 minutes. My system has 3 atoms per unit cell with 256 points on the coarse electronic grid (as a test at this stage). I am using 64 processors :

mpirun -np 64 /......./epw.x -npool 64 < epw1.in > epw1.out , which is 1 node with 64 processors.

I know that the kmap calculations are computationally expensive but is this normal?

if yes, Is it possible to speed up the calculations by using more nodes or any other distribution of processors? What would the format be?

Sorry to bother you with this but I am really desperate to finish testing and delve into the real calculations and it is taking long!

Regards
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia

sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: BAD TERMINATION using epw.x

Post by sponce »

Dear eliephys78,

For your first issue, indeed EPW did not support "exclude_bands" from Wannier.

However Roxana has been working on it and it might be working in the most recent EPW development version.

For the second issue, it should indeed not take that much time.

Maybe try login on the node to make sure the memory is not exploding and/or the job is not dead.

If the code works for simple small systems, then it is most likely a problem with memory.

Best wishes,
Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com

eliephys78
Posts: 82
Joined: Thu May 05, 2016 5:18 pm
Affiliation:

Re: BAD TERMINATION using epw.x

Post by eliephys78 »

Dear Samuel,

Thanks a lot for your help as always. i will try to figure out what is going on.

Regards
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia

Post Reply