Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

errors from the calculation software

vegalew
Beginner
1,142 Views
Dear all,

I am using a DFT academic software named ESPRESSO/PWSCF to calculate the electronic structure of my model.
I could calculate 72 atoms successfully with my system using openmpi. But when Iwanted to calculate 84 atoms, the error message stoped my calculation. Then I tried the mpich2 using the same system. With the help of mpich2 I could calculate 120 atoms instead. But the error message bothered me again when I wanted to relax 132 atoms. I was getentangle by his troublesome thing for quite a long time. Could someone give me some suggestions to cope with this?
the error message was something like this,
[node1][0,1,12][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8D51C Unknown Unknown Unknown
pw.x 081CBD7B Unknown Unknown Unknown
pw.x 0823A95E Unknown Unknown Unknown
pw.x 08239C4A Unknown Unknown Unknown
pw.x 081DEDC9 Unknown Unknown Unknown
pw.x 081D4E9C Unknown Unknown Unknown
Unknown FFFFD060 Unknown Unknown Unknown

Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
mca_oob_tcp.so 55F911B4 Unknown Unknown Unknown
Unknown 00000001 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
pw.x 0813EE72 Unknown Unknown Unknown
pw.x 0813E577 Unknown Unknown Unknown

Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE40E Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE40E Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8C50F Unknown Unknown Unknown
pw.x 081CBD7B Unknown Unknown Unknown
pw.x 0823A95E Unknown Unknown Unknown
pw.x 08239C4A Unknown Unknown Unknown
pw.x 081DEDC9 Unknown Unknown Unknown
pw.x 081D4E9C Unknown Unknown Unknown
Unknown FFFFD060 Unknown Unknown Unknown

Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE40E Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8C50F Unknown Unknown Unknown
pw.x 081CBD7B Unknown Unknown Unknown
pw.x 0823A95E Unknown Unknown Unknown
pw.x 08239C4A Unknown Unknown Unknown
pw.x 081DEDC9 Unknown Unknown Unknown
pw.x 081D4E9C Unknown Unknown Unknown
Unknown FFFFD060 Unknown Unknown Unknown

Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
Unknown 00000003 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8C50B Unknown Unknown Unknown
pw.x 081CBD7B Unknown Unknown Unknown
pw.x 0823A95E Unknown Unknown Unknown
pw.x 08239C4A Unknown Unknown Unknown
pw.x 081DEDC9 Unknown Unknown Unknown
pw.x 081D4E9C Unknown Unknown Unknown
Unknown FFFFCDC0 Unknown Unknown Unknown

Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
[node1][0,1,12][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8BF47 Unknown Unknown Unknown
pw.x 080EA567 Unknown Unknown Unknown

Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8BF3B Unknown Unknown Unknown
pw.x 081E3C7B Unknown Unknown Unknown

Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
[node8][0,1,23][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
mpirun noticed that job rank 14 with PID 3519 on node node3 exited on signal 11 (Segmentation fault).

For better understanding my question, I will show the detail of my systems as follows,
there are 8 nodes in my cluster with the Ethernet.
CPUintel Q6600
Memory8G per node
Main Board intel S3000AH
hard disk seagate 750G (7200)
OS redhat linux enterprise 4 as 4 update 4
Fortran intel ifort 10.1.015
Cintel icc 10.1.015
MPImpich2/openmpi
FFTWfftw 2.1.5
MKL10.0.1.014
thank you for reading. any hints will be deeply appreciated.
vega
=================================================================================
Vega Lew (weijia liu)
PH.D Candidate in Chemical Engineering
State Key Laboratory of Materials-oriented Chemical Engineering
College of Chemistry and Chemical Engineering
Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
0 Kudos
3 Replies
Andrey_Bespalov
New Contributor I
1,142 Views

Are you shure that application is linked with MKL? We don't have such library as libblas.so.3 in MKL. It looks like the error in implementation of BLAS some other vendor. How do you link your application?


forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8D51C Unknown Unknown Unknown

0 Kudos
vegalew
Beginner
1,142 Views

Are you shure that application is linked with MKL? We don't have such library as libblas.so.3 in MKL. It looks like the error in implementation of BLAS some other vendor. How do you link your application?


forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8D51C Unknown Unknown Unknown

Dear sir,

thank you for your time.

Unfortunately, I'm not very clear about linking libraries. What I did was just follow the manual of the software. I think my software was linked with the MKL. Because during the 'configure' process, the software printed the detected libraries like this,

BLAS_LIBS=-L/opt/intel/mkl/10.0.1.014/lib/em64t -lmkl_em64t
LAPACK_LIBS= -L/opt/intel/mkl/10.0.1.014/lib/em64t -lmkl_em64t
FFT_LIBS=-L/home/vega/espresso-4.0/fftw/lib/ -lfftw

So my blas library must be have something to do with the MKL. Do you think so?

thank you for reading.

vega

=================================================================================
Vega Lew (weijia liu)
PH.D Candidate in Chemical Engineering
State Key Laboratory of Materials-oriented Chemical Engineering
College of Chemistry and Chemical Engineering
Nanjing University of Technology, 210009, Nanjing, Jiangsu, China

0 Kudos
Andrey_Bespalov
New Contributor I
1,142 Views

Unfortunately there is not enough information.

Could you 1) run ldd for your application and send the results; 2) send the logs of building of your application?

0 Kudos
Reply