segmentation fault for high q-grid

Post here questions linked with issue while running the EPW code

Moderator: stiwari

Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

segmentation fault for high q-grid

Post by Nandan »

Hi,

I keep getting :
===============================================
mpirun noticed that process rank 14 with PID 31325 on node ncb044.ceg.asu.edu exited on signal 11 (Segmentation fault)mpirun noticed that process rank 14 with PID 31325 on node ncb044 exited on signal 11 (Segmentation fault)
===============================================

when k-grid is 20x20x20 and q-grid is 40x40x40 or higher.
For k-grid is 20x20x20 and q-grid is 20x20x20 the job run successfully.

QE+EPW is compiled with : GNU Fortran (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)

and LAPACK_LIBS = $(TOPDIR)/LAPACK/liblapack.a $(TOPDIR)/LAPACK/libblas.a

I considered memory to be an issue so checked: less /proc/meminfo
MemTotal: 131930920 kB
MemFree: 103500560 kB
MemAvailable: 129815288 kB

Can you suggest where else I should look?
============================================================================================
There is one more issue. For the same job, if "epwread = .true." is set, and the job
is run for the same dense grid above, I get the following error message:


At line 221 of file ephwann_shuffle.f90
Fortran runtime error: Attempting to allocate already allocated variable 'tau'
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 683 on
node ncb044.ceg.asu.edu exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------


Any help with both the issues would be great.

Thanks and regards,

Nandan.
sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: segmentation fault for high q-grid

Post by sponce »

Hello Nandan,

For a restart, tau should not be allocated until 221.

Can you check that your crystal.fmt file is not corrupted.

You should also have:
kmaps = .true.
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .true.

For the 40x40x40 issue, could you try 40x40x40 k-point with 40x40x40 q-point to see if it works.
If its too expansive you could try 30x30x30 k and q.

It should in principle work to have q grid > k grid but its not very commun. In general you want k> qgrid.

Best,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: segmentation fault for high q-grid

Post by Nandan »

Thanks Samuel.

The error was happening when the setting was as below. Since I have
elecselfen=.true., I have q-grid > k-grid:

elph = .true.
kmaps = .true.
epbread = .true.
epwread = .true.
etf_mem = .false.
wannierize = .false.
lpolar = .true.

elecselfen = .true.
phonselfen = .false.


the k/q= 40x40x40 crashed immediately with the earlier segmentation fault error.

Now, reduced it to 30x30x30 which is running with the suggested setting.

Nandan.
sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: segmentation fault for high q-grid

Post by sponce »

Hello,

As I mentioned in my previous post, you need epbread = .false. for successful restart.

Best,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: segmentation fault for high q-grid

Post by Nandan »

If, as you suggested, I set

epbwrite = .false.
epbread = .false.

then what is actually happening?

Meanwhile I have been running the case with 30x30x30 k and q.

Nandan.
sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: segmentation fault for high q-grid

Post by sponce »

Well its reading from the real-space Wannier with epwread = .true.

The epb files are only intermediated quantity.
You do not need them once you have the XX.epmatwp1 files.

Best,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: segmentation fault for high q-grid

Post by Nandan »

GaN with following input fails with error:
===============
mpirun noticed that process rank 7 with PID 25546 on node ncb005 exited on signal 11 (Segmentation fault).
===============

elph = .true.
epbread = .false.
epbwrite = .true.
epwread = .false.
epwwrite = .true.

kmaps = .false.

etf_mem = .false.

nbndsub = 16
nbndskip = 0

dis_win_max = 21.5
dis_froz_max= 4

wannierize = .true.
num_iter = 5000
iprint = 2
proj(1) = 'random'
proj(2) = 'N:sp3'

elecselfen = .true.
phonselfen = .false.

parallel_k = .true.
parallel_q = .false.

fsthick = 1.d10
eptemp = 300.d0
degaussw = 0.002

dvscf_dir = '../phonons/save'
filukk = './gan.ukk'
lpolar = .true.
iverbosity = 3

nk1 = 6
nk2 = 6
nk3 = 6

nq1 = 6
nq2 = 6
nq3 = 6

nkf1 = 20
nkf2 = 20
nkf3 = 20

nqf1 = 50
nqf2 = 50
nqf3 = 50


Where as a somewhat smaller k/q-grid run without error. The crystal.fmt file seems fine.

Nandan.
Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: segmentation fault for high q-grid

Post by Nandan »

I am still confused with the issue of presence of
XX.epmatwp1 and XX.epmatwe1 files and the number of processors/pools.

A GaN calculations where *.epmatwp1 and *.epmatwe1 were
written and 40 processor/pools were used completed without error.

So I restarted the job with
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .true.

and 120 processor/pools for a denser k-q mesh. That job crashed with
segmentation fault.

Does the number of procs and pools to be kept constant even for subsequent
calculations of denser grids?

If both XX.epmatwp1 and XX.epmatwe1 are present, should the job
be restarted with:
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .false.



Regards,

Nandan.

ECE
Michigan State University.
sponce
Site Admin
Posts: 616
Joined: Wed Jan 13, 2016 7:25 pm
Affiliation: EPFL

Re: segmentation fault for high q-grid

Post by sponce »

Hello,

If you have EPW v4.1, you should indeed be able to do
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .true.

and 120 processor/pools for a denser k-q mesh.
That should work regardless of the nb of processors used before.

You should also have kmaps =.true.

Please provide the segfault with more information if possible (usually PBE generate an .o and .e files that contain information and slurm generates the slurm files, both should give more info than just "segfault".

Best,

Samuel
Prof. Samuel Poncé
Chercheur qualifié F.R.S.-FNRS / Professeur UCLouvain
Institute of Condensed Matter and Nanosciences
UCLouvain, Belgium
Web: https://www.samuelponce.com
Nandan
Posts: 44
Joined: Mon May 09, 2016 2:47 pm
Affiliation:

Re: segmentation fault for high q-grid

Post by Nandan »

Hi Samuel,

Thanks for the reply. I have sent error message to your email.

I am using EPW-4.1 and I am using:

kmaps=.true.
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .true.

Nandan.
Post Reply