- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to use MKL PBLAS/ScaLAPACK routine as proposed in the following link: http://software.intel.com/en-us/articles/using-cluster-mkl-pblasscalapack-fortran-routine-in-your-c-program. The source code (downloadable from the same site) is also attached to this post.
I am using the Intel® Composer 2011.2.137, compiler icc 12.0.2 20110112, and OpenMPI 1.4.3.
According to the Intel® Math Kernel Library Link Line Advisor I am compiling by
mpicc -w -o pdgemv pdgemv.c -I$(MKLROOT)/include -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -limf -lm -openmp -DMKL_ILP64
Compiling is fine, but running the program via
mpirun -n 4 ./pdgemv
causes the following segmentation fault:
[node266:15074] *** Process received signal ***
[node266:15074] Signal: Segmentation fault (11)
[node266:15074] Signal code: Address not mapped (1)
[node266:15074] Failing at address: 0x44000098
[node266:15074] [ 0] /lib64/libpthread.so.0 [0x3f8420eb10]
[node266:15074] [ 1] /openmpi/1.4.3/intel--co-2011.2.137--binary/lib/libmpi.so.0(MPI_Comm_size+0x5a) [0x2abdef96c17a]
[node266:15074] [ 2] /intel/co-2011.2.137/binary/mkl/lib/intel64/libmkl_blacs_intelmpi_ilp64.so(ilp64_Cblacs_pinfo+0x92) [0x2abdef3be4a2]
[node266:15074] *** End of error message ***
I don't understand what is wrong, hope someone can help me. Thanks and kind regards.
Massi
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your code stopped on segfault exception.In your case I think that this Failing at address: 0x44000098 could be either a faulting ip or wrong memory address beign referenced.Probably the address referenced is unreadeable memory or has not been mapped(heap) or has not been commited by your app.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have any updates?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi iliyapolak,
thank you for your answer, but I don't understand how can I solve this issue.
I have to add also that I cannot set the linux environment variables as shown in the link I posted:
$source /opt/intel/mkl/10.x.x.0xx/tools/environment/mklvarsem64t.sh
$source /opt/intel/mpi/3.x.x/bin64/mpivars.sh
because I cannot find these paths and files in my linux intel composer version.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Massimiliano,
sorry,but I do not know how to solve it.At least you can ask Intel devs for help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Btw if you want you can post callstack of the failed process.Maybe we can get some more relevant info regarding the bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi massi,
I saw you are using OpenMPI, but -lmkl_blacs_intelmpi_ilp64 are for Intel MPI and MPICH2. This may be the cause.
You may try the command like
source /opt/intel/composer_xe_2011.2.137/bin/iccvars.sh intel64
soruce /opt/intel/composer_xe_2011.2.137/mkl/bin/mklvars.sh intel64
(The two commands are the corresponding part of old version of mkl and compiler)
and your openmpi path setting
And the link advisor line:
$(MKLROOT)/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_intel_ilp64.a $(MKLROOT)/lib/intel64/libmkl_intel_thread.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/lib/intel64/libmkl_blacs_openmpi_ilp64.a -Wl,--end-group -lpthread -lm -DMKL_ILP64
and let us know how it works.
Best Regards,
Ying
libmkl_blacs_lp64.a
LP64 version ofBLACSroutines supporting the following MPICH versions:
-
Myricom* MPICH version 1.2.5.10
-
ANL* MPICH version 1.2.5.2
libmkl_blacs_ilp64.a
ILP64 version ofBLACSroutines supporting the following MPICH versions:
-
Myricom* MPICH version 1.2.5.10
-
ANL* MPICH version 1.2.5.2
libmkl_blacs_intelmpi_lp64.a
LP64 version ofBLACSroutines supporting Intel MPI and MPICH2
libmkl_blacs_intelmpi_ilp64.a
ILP64 version ofBLACSroutines supporting Intel MPI and MPICH2
libmkl_blacs_intelmpi20_lp64.a
A soft link tolib/intel64/libmkl_blacs_intelmpi_lp64.a
libmkl_blacs_intelmpi20_ilp64.a
A soft link tolib/intel64/libmkl_blacs_intelmpi_ilp64.a
libmkl_blacs_openmpi_lp64.a
LP64 version ofBLACSroutines supporting OpenMPI.
libmkl_blacs_openmpi_ilp64.a
ILP64 version ofBLACSroutines supporting OpenMPI.
libmkl_blacs_sgimpt_lp64.a
LP64 version ofBLACSroutines supporting SGI MPT.
libmkl_blacs_sgimpt_ilp64.a
ILP64 version ofBLACSroutines supporting SGI MPT.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Massi, why not try another variant without ILP64! is your problem is really huge? you can just try to use the ordinary LP64 version first.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
first of all I want to thank you all for your help.
I have followed the hint of Ying H, now I set the environment variables with
source /opt/intel/composer_xe_2011.2.137/bin/iccvars.sh intel64
soruce /opt/intel/composer_xe_2011.2.137/mkl/bin/mklvars.sh intel64
and I link to mkl_blacs_openmpi_ilp64 instead of mkl_blacs_intelmpi_ilp64. I'm wondering about a note given by the Intel® Math Kernel Library Link Line Advisor:
If you are using a non-default MPI, assign the same appropriate value to MKL_BLACS_MPI on all nodes. Set MKL_BLACS_MPI variable to one of the following values: INTELMPI, MPICH2 or MSMPI.
Which value should I set,if I have to, to MKL_BLACS_MPI? Maybe I'm missing some other environment variable?
I still compile without any problem
mpicc -o pdgemv pdgemv.c -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_ilp64 -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_openmpi_ilp64 -Wl,--end-group -liomp5 -lpthread -limf -lm -DMKL_ILP64
but at run time I get the following error from every mpi task:
[node078:02059] *** Process received signal ***
[node078:02059] Signal: Floating point exception (8)
[node078:02059] Signal code: Integer divide-by-zero (1)
[node078:02059] Failing at address: 0x2b8428108a7e[node078:02059] [ 0] /lib64/libpthread.so.0 [0x3a9a80eb10]
[node078:02059] [ 1] composerxe-2011.2.137/mkl/lib/intel64/libmkl_scalapack_ilp64.so(numroc_+0xe) [0x2b8428108a7e]
[node078:02059] *** End of error message ***
Compiling and running in debug mode I get the following output
{ 1, 0}: On entry to
{ 1, 1}: On entry to
{ 0, 0}: On entry to
DESCINIT parameter number 6 had an illegal value
{ 0, 1}: On entry to
DESCINIT parameter number 6 had an illegal value
{ 0, 1}: On entry to
DESCINIT parameter number 6 had an illegal value
{ 1, 1}: On entry to
DESCINIT parameter number 6 had an illegal value
{ 1, 1}: On entry to
DESCINIT parameter number 6 had an illegal value
DESCINIT parameter number 6 had an illegal value
{ 0, 1}: On entry to
DESCINIT parameter number 6 had an illegal value
{ 0, 0}: On entry to
DESCINIT parameter number 6 had an illegal value
{ 0, 0}: On entry to
DESCINIT parameter number 6 had an illegal value
DESCINIT parameter number 6 had an illegal value
{ 1, 0}: On entry to
DESCINIT parameter number 6 had an illegal value
{ 1, 0}: On entry to
DESCINIT parameter number 6 had an illegal value
[node078:31137] *** Process received signal ***
[node078:31137] Signal: Floating point exception (8)
[node078:31137] Signal code: Integer divide-by-zero (1)
[node078:31137] Failing at address: 0x405f64
[node078:31137] [ 0] /lib64/libpthread.so.0 [0x3a9a80eb10]
[node078:31137] [ 1] pdgemv [0x405f64]
[node078:31137] [ 2] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3a9a01d994]
[node078:31137] [ 3] pdgemv [0x4057c9]
[node078:31137] *** End of error message ***
and analyzing the core file with gdb i get:
Program terminated with signal 8, Arithmetic exception.
#0 0x0000000000405f64 in main (argc=1, argv=0x200000001) at pdgemv.c:87
87 sat= (myrow*nb)+i+(i/nb)*nb;
because actually nb has been set to 0 by numroc_. According to this I think that I get the warning from descinit as explained in this topic: http://software.intel.com/en-us/forums/topic/293296
I have also tried to change to LP64 as proposed by Gennady Fedorov, but nothing seems to change.
Still I cannot fix this issue...Thank you all again, if you need other informations please ask me!
Ciao!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems that either variable "i" or "nb" could be 0.Can you post those values?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My advise is to step-in through the code arround the call site of arithmetic exception and post the result of those two variable mentioned in my previous post.Can you do it with GDB?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The value of nb seems to be modified and set to 0 after calling Cblacs_gridinfo...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so this is the culprit of division by zero exception
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page