John,

John_Young · ‎09-24-2013

In some of our work (using the Intel Fortran compiler), we are linking the intel blacs against the intel mpi library and seem to be having some 4/8 byte interface issues. The documentation is not very clear to us about what is happening.

On Windows, when we link the ILP64 scalapack/blacs against the ILP64 version of the intel mpi library, everything works great. This seems to indicate to us that IPL64 scalapck/blacs are assuming 8-byte integer interfaces to the intel mpi library.

On linux, when we link the ILP64 scalapack/blacs against the ILP64 version of the intel mpi library weird things happen in calls to blacs routines, but when the link the ILP64 scalapack/blacs against the LP64 version of the intel mpi library, things seem to work well. If we link against an openmpi 8-byte library using the ILP64 scalapck/mkl_blacks_openmpi_ilp64 then everything works fine.

It seems to us that on linux, the mkl_blacs_intelmpi_ilp64 is still assuming that the mpi library is using 4-byte integers, while on Windows, the blacs_intelmpi_ilp64 is assuming 8-byte integer mpi library function interfaces.

The MKL manual does not specify (that we can find) whether the ILP64 blacs assumes a 4- or 8-byte integer mpi interface. Is there more information about what mpi library interface the blacs libraries are assuming they are to be linked against?

Thanks,
John

Zhang_Z_Intel · ‎09-24-2013

What integer type do you use in your code? Is it INTEGER, without specifying kind?

If so, then the ILP64 interface always assumes INTEGER to be 8-bype, no matter it is on Windows or Linux. To make sure you are using ILP64 correctly, do the following things:

1. Include mkl_scalapack.h and mkl.h header files in your Fortran code.

2. Specify the -i8 (Linux) or the /4I8 (Windows) option to the ifort compiler.

3. Make sure your link line looks like: -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lmkl_blacs_intelmpi_ilp64 -lpthread -lm

For more information on the link line, see http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

John_Young · ‎09-24-2013

Zhang Z (Intel) wrote:

What integer type do you use in your code? Is it INTEGER, without specifying kind?

If so, then the ILP64 interface always assumes INTEGER to be 8-bype, no matter it is on Windows or Linux. To make sure you are using ILP64 correctly, do the following things:

1. Include mkl_scalapack.h and mkl.h header files in your Fortran code.

2. Specify the -i8 (Linux) or the /4I8 (Windows) option to the ifort compiler.

3. Make sure your link line looks like: -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lmkl_blacs_intelmpi_ilp64 -lpthread -lm

We are using 8-byte integers. Following the mpi manual instructions for ILP64, we do not include the fortran mpi module, but include the 'mpif.h' header file instead. I do not believe we can include the mkl_scalapack.h and mkl.h in Fortran code as they are not valid Fortran. We are using the proper compile flags, e.g., -i8, and the libraries as shown in 3.

Again, the issue is not our own code interfacing with the mpi libraries. When we compile with the 4 or 8 byte mpi interface, our code appropriately uses the correct integer size in the direct mpi interface calls. However, we are seeing that on linux, calls to the mkl blacs routines (which should call the mpi library under the hood) are failing when we link the ILP64 mpi library with the ILP64 blacs library. However, the blacs routines do not seem to fail when we link the LP64 mpi library with the ILP64 blacs library.

Attached is a small test case. You can switch between the ILP64 and LP64 mpi libraries by modifying the two lines at the top of the file according to the comments. When we compile blacs/mpi with ILP64/ILP64, the code crashes in the blacs_gridmap call with the error that two of the nodes requested in the grid have the same id. This makes me think that the blacs routines are calling the 4-byte mpi library since the first node in the gridmap is node 0 (8-bytes) which the 4-byte mpi library would intepret as two 4-byte 0's. If we compile blacs/mpi as ILP64/LP64, this error does not occur. On Windows, we do not see this behavior, only linux.

Thanks,
John

John_Young · ‎09-24-2013

I don't think the test case uploaded properly in my previous post. Here it is.

John

Zhang_Z_Intel · ‎09-24-2013

John,

Thank you for providing the test code. Can you please share with me:

1. The command line or makefile that you used to build the code?

2. The version numbers of the Intel Fortran compiler, Intel MKL , and Intel MPI?

Thanks.

Zhang

John_Young · ‎09-24-2013

Zhang Z (Intel) wrote:

John,

Thank you for providing the test code. Can you please share with me:

1. The command line or makefile that you used to build the code?

2. The version numbers of the Intel Fortran compiler, Intel MKL , and Intel MPI?

Thanks.

Zhang

1. The compile line for LP64 the mpi library is

mpiifort -i8 -traceback test.F90 -L/share/cluster/RHEL6.2/x86_64/apps/intel/ict/composer_xe_2013.0.079/mkl/lib/intel64 \
-lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -liomp5

and I run with

mpirun -n 5 a.out

To use the ILP64 mpi library I use the compile line (after modifying test.F90 according to the comments in the file)

mpiifort -i8 -traceback -ilp64 test.F90 -L/share/cluster/RHEL6.2/x86_64/apps/intel/ict/composer_xe_2013.0.079/mkl/lib/intel64 \
-lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -liomp5

and I run with

mpirun -ilp64 -n 5 a.out

2. The output of mpiifort is below. The Fortran compiler is 12.1.0 and the MPI library is 3.2.1. MKL version is 11.0.0 Product Build 20120801 for Intel(R) 64 architecture applications

Zhang_Z_Intel · ‎09-27-2013

John,

Sorry for the delayed response. I tried your reproducer and it worked fine for me! My environment is installed with the latest Intel Fortran compiler and Intel MPI. Your MPI installation is more than 4 years old. I believe many bugs have been eradicated during the last 4 years. Would you update your MPI to the latest (Intel MPI 4.1) and try again?

Here's my screen output when I ran your test code on Linux with ILP64 enabled. Please see it attached.

John_Young · ‎09-28-2013

Hi Zhang,

Thank you very much for running our test case. I can ask about upgrading the Intel MPI, but the cluster we run on is out of our control, so I'm not sure whether we will be able to or not. By the way, did you try running the ILP64 Blacs with the LP64 MPI library? If so, did it run or crash?

Interestingly, when I run on my Windows machine (which has MPI 4.0 Update 3), the ILP64 MPI interface with the -i8 compiler flag works perfectly as well as the LP64 MPI interface with the -i8 compiler flag. Since I cannot find it in the MKL documentation anywhere, could you confirm explicitly for me which MPI interface (ILP64 or LP64) Intel recommends is supposed to be linked with the ILP64 blacs?

Also, if you are able to, could you check one more thing in the test case I sent you? The test case was actually a test case for another possible scalapack bug in the QR factorization (http://software.intel.com/en-us/forums/topic/473803). Could you change the comments on around line 160 so that the zz_size=200 instead of 400 and see if you observe any MPI recv error? We see (on both windows and linux) a crash in this case, which we believe may be a bug in scalapack.

Thanks for your help.
John

8-Byte Blacs Library and 4/8 byte mpi library