segmentation fault for high q-grid
Posted: Mon Dec 05, 2016 4:31 pm
Hi,
I keep getting :
===============================================
mpirun noticed that process rank 14 with PID 31325 on node ncb044.ceg.asu.edu exited on signal 11 (Segmentation fault)mpirun noticed that process rank 14 with PID 31325 on node ncb044 exited on signal 11 (Segmentation fault)
===============================================
when k-grid is 20x20x20 and q-grid is 40x40x40 or higher.
For k-grid is 20x20x20 and q-grid is 20x20x20 the job run successfully.
QE+EPW is compiled with : GNU Fortran (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
and LAPACK_LIBS = $(TOPDIR)/LAPACK/liblapack.a $(TOPDIR)/LAPACK/libblas.a
I considered memory to be an issue so checked: less /proc/meminfo
MemTotal: 131930920 kB
MemFree: 103500560 kB
MemAvailable: 129815288 kB
Can you suggest where else I should look?
============================================================================================
There is one more issue. For the same job, if "epwread = .true." is set, and the job
is run for the same dense grid above, I get the following error message:
At line 221 of file ephwann_shuffle.f90
Fortran runtime error: Attempting to allocate already allocated variable 'tau'
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 683 on
node ncb044.ceg.asu.edu exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Any help with both the issues would be great.
Thanks and regards,
Nandan.
I keep getting :
===============================================
mpirun noticed that process rank 14 with PID 31325 on node ncb044.ceg.asu.edu exited on signal 11 (Segmentation fault)mpirun noticed that process rank 14 with PID 31325 on node ncb044 exited on signal 11 (Segmentation fault)
===============================================
when k-grid is 20x20x20 and q-grid is 40x40x40 or higher.
For k-grid is 20x20x20 and q-grid is 20x20x20 the job run successfully.
QE+EPW is compiled with : GNU Fortran (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
and LAPACK_LIBS = $(TOPDIR)/LAPACK/liblapack.a $(TOPDIR)/LAPACK/libblas.a
I considered memory to be an issue so checked: less /proc/meminfo
MemTotal: 131930920 kB
MemFree: 103500560 kB
MemAvailable: 129815288 kB
Can you suggest where else I should look?
============================================================================================
There is one more issue. For the same job, if "epwread = .true." is set, and the job
is run for the same dense grid above, I get the following error message:
At line 221 of file ephwann_shuffle.f90
Fortran runtime error: Attempting to allocate already allocated variable 'tau'
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 683 on
node ncb044.ceg.asu.edu exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
Any help with both the issues would be great.
Thanks and regards,
Nandan.