EPW calculation is killed at dvscf calculation

Post here questions linked with issue while running the EPW code

Moderator: stiwari

Post Reply
Gautam Sharma Monty
Posts: 27
Joined: Mon Oct 22, 2018 10:31 am
Affiliation:

EPW calculation is killed at dvscf calculation

Post by Gautam Sharma Monty »

Dear Sir,
I am doing EPW calculations with QE-6.4 using PBE functionals along with Spin-orbit coupling (SOC). There are 5 irreducible points (25 total q-points) in the phonon calculations. The code is stopping arbitrarily at any of irreducible points while computing dvscf for star q-points. I have resubmitted the calculations many times with different cores like 56 cores / 84 cores /112 cores. But calculations is stopping every time and I can not understand the reason behind it. There is nothing printed about the memory required to do these calculations. Can you please help me out?


Following is the first output with 56 cores (2 nodes, 28 cores/node, 128 Gb memory/node) :
===================================================================
irreducible q point # 2
===================================================================

Symmetries of small group of q: 1

Number of q in the star = 6
List of q in the star:
1 0.000000000 0.230940108 0.000000000
2 -0.200000000 -0.115470054 0.000000000
3 0.200000000 -0.115470054 0.000000000
4 0.000000000 -0.230940108 0.000000000
5 0.200000000 0.115470054 0.000000000
6 -0.200000000 0.115470054 0.000000000
Dyn mat calculated from ifcs

q( 2 ) = ( 0.0000000 0.2309401 0.0000000 )
q( 3 ) = ( -0.2000000 -0.1154701 0.0000000 )
q( 4 ) = ( 0.2000000 -0.1154701 0.0000000 )
q( 5 ) = ( 0.0000000 -0.2309401 0.0000000 )


After this, jobs is killed automatically.


Following is the second output with 112 cores (4 nodes, 28 cores/node, 128 Gb memory/node) :

===================================================================
irreducible q point # 5
===================================================================

Symmetries of small group of q: 1

Number of q in the star = 6
List of q in the star:
1 0.200000000 0.577350269 0.000000000
2 -0.600000000 -0.115470054 0.000000000
3 0.400000000 -0.461880215 0.000000000
4 -0.200000000 -0.577350269 0.000000000
5 0.600000000 0.115470054 0.000000000
6 -0.400000000 0.461880215 0.000000000
Dyn mat calculated from ifcs

q( 20 ) = ( 0.2000000 0.5773503 0.0000000 )


After this, jobs is killed automatically.


Following is the error file :

Intel(R) Parallel Studio XE 2017 Update 4 for Linux*
Copyright (C) 2009-2017 Intel Corporation. All rights reserved.
=>> PBS: job killed: node 1 (cn02) requested job die, code 15009
[mpiexec@cn01] control_cb (../../pm/pmiserv/pmiserv_cb.c:781): connection to proxy 1 at host cn02 failed
[mpiexec@cn01] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@cn01] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:500): error waiting for event
[mpiexec@cn01] main (../../ui/mpich/mpiexec.c:1130): process manager error waiting for completion

hlee
Posts: 415
Joined: Thu Aug 03, 2017 12:24 pm
Affiliation: The University of Texas at Austin

Re: EPW calculation is killed at dvscf calculation

Post by hlee »

Dear Gautam Sharma Monty:

I don't know much about the EPW included in QE v6.4.
Given informations, I can't give you the clear answer; I would just suggest to try the most recent (development) version of EPW at https://gitlab.com/QEF.

Also if you suspect the memory issue, you could try to use 4 nodes and 14 cores/node (total of 56 cores).

However, before any trials, you should make sure that you merged files (dvscf, etc) correctly if your phonon calculations were split (http://epw.phpbbhosts.co.uk/viewtopic.php?f=3&t=1270).

Sincerely,

H. Lee

Gautam Sharma Monty
Posts: 27
Joined: Mon Oct 22, 2018 10:31 am
Affiliation:

Re: EPW calculation is killed at dvscf calculation

Post by Gautam Sharma Monty »

Dear Sir,

I have collected dvscf correctly, I have recomputed the phonon at q=5 such that it is completed in one run. So, there should be no such issue related to this.
Apart from this, I should point out a glitch b/w EPW and QE particular to SOC cases.
When we compute phonon with QE>6.2, then this no longer generates dyn#.xml files, but EPW demands these files in xml format. So, one has to go back to QE-6.2 and get dyn files in xml format after recovering the phonons. I think EPW should also consider the normal format in SOC cases. In other words,
ifc.q2r.xml should be abandoned and rather than that ifc.q2r should work which is generated using QE>6.2.

hlee
Posts: 415
Joined: Thu Aug 03, 2017 12:24 pm
Affiliation: The University of Texas at Austin

Re: EPW calculation is killed at dvscf calculation

Post by hlee »

Dear Gautam Sharma Monty:

Regarding the xml issue:
Did you try the EPW included in QE v6.5 or the recent version of EPW? I didn't implement it, but I think that this issue is addressed by calling the subroutine of check_is_xml_file.
Please check the subroutines of read_ifc in io_epw.f90 and dynmat_asr in dynmat_asr.f90.

Sincerely,

H. Lee

Gautam Sharma Monty
Posts: 27
Joined: Mon Oct 22, 2018 10:31 am
Affiliation:

Re: EPW calculation is killed at dvscf calculation

Post by Gautam Sharma Monty »

Thank you, Sir. I will check out latest version.

Gautam Sharma Monty
Posts: 27
Joined: Mon Oct 22, 2018 10:31 am
Affiliation:

Re: EPW calculation is killed at dvscf calculation

Post by Gautam Sharma Monty »

Dear Sir,
I am trying EPW calculations with QE-6.5 with phonon-dvscf, and dyn#xml computed using QE-6.4, but calculations is getting killed everytime. Could it be due to directory of phonon-dvscf files computed using qe-6.4. However, EPW is running using QE-6.4. Do I need to compute the phonons again with QE-6.5?

hlee
Posts: 415
Joined: Thu Aug 03, 2017 12:24 pm
Affiliation: The University of Texas at Austin

Re: EPW calculation is killed at dvscf calculation

Post by hlee »

Dear Gautam Sharma Monty:

>Could it be due to directory of phonon-dvscf files computed using qe-6.4.

I didn't check it, but you can check it yourself by performing test calculations for a simple system, for example, Pb with spin-orbit coupling.

>Do I need to compute the phonons again with QE-6.5?

I still think that the main trouble might come from the step of merging dvscf files, etc. Although you said there is no problem, I can't confirm it.
In your case, it is not easy for me to give you the clear answer.

Sincerely,

H. Lee

Gautam Sharma Monty
Posts: 27
Joined: Mon Oct 22, 2018 10:31 am
Affiliation:

Re: EPW calculation is killed at dvscf calculation

Post by Gautam Sharma Monty »

Dear Sir,

It was the memory issue which is resolved by your reply, "Also if you suspect the memory issue, you could try to use 4 nodes and 14 cores/node (total of 56 cores)."
I am grateful for this.

Post Reply