Page 1 of 2

segmentation fault for high q-grid

Posted: Mon Dec 05, 2016 4:31 pm
by Nandan
Hi,

I keep getting :
===============================================
mpirun noticed that process rank 14 with PID 31325 on node ncb044.ceg.asu.edu exited on signal 11 (Segmentation fault)mpirun noticed that process rank 14 with PID 31325 on node ncb044 exited on signal 11 (Segmentation fault)
===============================================

when k-grid is 20x20x20 and q-grid is 40x40x40 or higher.
For k-grid is 20x20x20 and q-grid is 20x20x20 the job run successfully.

QE+EPW is compiled with : GNU Fortran (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)

and LAPACK_LIBS = $(TOPDIR)/LAPACK/liblapack.a $(TOPDIR)/LAPACK/libblas.a

I considered memory to be an issue so checked: less /proc/meminfo
MemTotal: 131930920 kB
MemFree: 103500560 kB
MemAvailable: 129815288 kB

Can you suggest where else I should look?
============================================================================================
There is one more issue. For the same job, if "epwread = .true." is set, and the job
is run for the same dense grid above, I get the following error message:


At line 221 of file ephwann_shuffle.f90
Fortran runtime error: Attempting to allocate already allocated variable 'tau'
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 683 on
node ncb044.ceg.asu.edu exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------


Any help with both the issues would be great.

Thanks and regards,

Nandan.

Re: segmentation fault for high q-grid

Posted: Mon Dec 05, 2016 5:52 pm
by sponce
Hello Nandan,

For a restart, tau should not be allocated until 221.

Can you check that your crystal.fmt file is not corrupted.

You should also have:
kmaps = .true.
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .true.

For the 40x40x40 issue, could you try 40x40x40 k-point with 40x40x40 q-point to see if it works.
If its too expansive you could try 30x30x30 k and q.

It should in principle work to have q grid > k grid but its not very commun. In general you want k> qgrid.

Best,

Samuel

Re: segmentation fault for high q-grid

Posted: Mon Dec 05, 2016 6:12 pm
by Nandan
Thanks Samuel.

The error was happening when the setting was as below. Since I have
elecselfen=.true., I have q-grid > k-grid:

elph = .true.
kmaps = .true.
epbread = .true.
epwread = .true.
etf_mem = .false.
wannierize = .false.
lpolar = .true.

elecselfen = .true.
phonselfen = .false.


the k/q= 40x40x40 crashed immediately with the earlier segmentation fault error.

Now, reduced it to 30x30x30 which is running with the suggested setting.

Nandan.

Re: segmentation fault for high q-grid

Posted: Mon Dec 05, 2016 7:54 pm
by sponce
Hello,

As I mentioned in my previous post, you need epbread = .false. for successful restart.

Best,

Samuel

Re: segmentation fault for high q-grid

Posted: Tue Dec 06, 2016 5:33 pm
by Nandan
If, as you suggested, I set

epbwrite = .false.
epbread = .false.

then what is actually happening?

Meanwhile I have been running the case with 30x30x30 k and q.

Nandan.

Re: segmentation fault for high q-grid

Posted: Tue Dec 06, 2016 7:37 pm
by sponce
Well its reading from the real-space Wannier with epwread = .true.

The epb files are only intermediated quantity.
You do not need them once you have the XX.epmatwp1 files.

Best,

Samuel

Re: segmentation fault for high q-grid

Posted: Tue Dec 06, 2016 8:12 pm
by Nandan
GaN with following input fails with error:
===============
mpirun noticed that process rank 7 with PID 25546 on node ncb005 exited on signal 11 (Segmentation fault).
===============

elph = .true.
epbread = .false.
epbwrite = .true.
epwread = .false.
epwwrite = .true.

kmaps = .false.

etf_mem = .false.

nbndsub = 16
nbndskip = 0

dis_win_max = 21.5
dis_froz_max= 4

wannierize = .true.
num_iter = 5000
iprint = 2
proj(1) = 'random'
proj(2) = 'N:sp3'

elecselfen = .true.
phonselfen = .false.

parallel_k = .true.
parallel_q = .false.

fsthick = 1.d10
eptemp = 300.d0
degaussw = 0.002

dvscf_dir = '../phonons/save'
filukk = './gan.ukk'
lpolar = .true.
iverbosity = 3

nk1 = 6
nk2 = 6
nk3 = 6

nq1 = 6
nq2 = 6
nq3 = 6

nkf1 = 20
nkf2 = 20
nkf3 = 20

nqf1 = 50
nqf2 = 50
nqf3 = 50


Where as a somewhat smaller k/q-grid run without error. The crystal.fmt file seems fine.

Nandan.

Re: segmentation fault for high q-grid

Posted: Tue Dec 20, 2016 7:53 pm
by Nandan
I am still confused with the issue of presence of
XX.epmatwp1 and XX.epmatwe1 files and the number of processors/pools.

A GaN calculations where *.epmatwp1 and *.epmatwe1 were
written and 40 processor/pools were used completed without error.

So I restarted the job with
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .true.

and 120 processor/pools for a denser k-q mesh. That job crashed with
segmentation fault.

Does the number of procs and pools to be kept constant even for subsequent
calculations of denser grids?

If both XX.epmatwp1 and XX.epmatwe1 are present, should the job
be restarted with:
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .false.



Regards,

Nandan.

ECE
Michigan State University.

Re: segmentation fault for high q-grid

Posted: Tue Dec 20, 2016 9:24 pm
by sponce
Hello,

If you have EPW v4.1, you should indeed be able to do
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .true.

and 120 processor/pools for a denser k-q mesh.
That should work regardless of the nb of processors used before.

You should also have kmaps =.true.

Please provide the segfault with more information if possible (usually PBE generate an .o and .e files that contain information and slurm generates the slurm files, both should give more info than just "segfault".

Best,

Samuel

Re: segmentation fault for high q-grid

Posted: Tue Dec 20, 2016 11:20 pm
by Nandan
Hi Samuel,

Thanks for the reply. I have sent error message to your email.

I am using EPW-4.1 and I am using:

kmaps=.true.
epbwrite = .false.
epbread = .false.
epwwrite = .false.
epwread = .true.

Nandan.