Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Maxime_B_
Beginner
276 Views

Problem compiling SuperLU_Dist 3.3 with Intel 14.0 (worked with Intel 2013)

Hi,

I am trying to compile SuperLU_Dist version 3.3 with OpenMPI 1.6.5 wrapper of Intel compiler icc version 14.0.1 (gcc version 4.8.0 compatibility), and it fails with a very strange error :

/software6/mpi/openmpi/1.6.5_intel/bin/mpicc -I/software6/mpi/openmpi/1.6.5_intel/include -I/software6/mpi/openmpi/1.6.5_intel/include -O3 -xHost -mkl -fPIC -m64 -fPIC -O3  -DAdd_ -DUSE_VENDOR_BLAS -c pdgstrf.c
make[1]: Leaving directory `/software6/src/petsc-3.4.3/externalpackages/SuperLU_DIST_3.3/SRC'
error #13002: unexpected CFE message argument:  e. The staggered cosine transform may be
warning #13003: message verification failed for: 556; reverting to internal message
pdgstrf.c(2672): warning #556: a value of type "int" cannot be assigned to an entity of type "MPI_Request"
            U_diag_blk_send_req[krow] = 1; /* flag outstanding Isend */
                                      ^     

pdgstrf.c(2672): warning #152: Fatal error: Trigonometric Transform has failed to release the memory.
            U_diag_blk_send_req[krow] = 1; /* flag outstanding Isend */
                                        ^     

compilation aborted for pdgstrf.c (code 1)
make[1]: *** [pdgstrf.o] Error 1

 

I understand that the code of SuperLU_Dist is non-standard (it assigns an int to a type MPI_Request), but why is the compiler crashing with this weird message :

error #13002: unexpected CFE message argument:  e. The staggered cosine transform may be
warning #13003: message verification failed for: 556; reverting to internal message

pdgstrf.c(2672): warning #152: Fatal error: Trigonometric Transform has failed to release the memory.

 

This seems to be a compiler bug, since it worked with Intel icc version 13.0.0 (gcc version 4.1.2 compatibility)

Thanks,

Maxime Boissonneault

0 Kudos
36 Replies
Kittur_G_Intel
Employee
189 Views

Hi Maxime,

I installed OpenMPI 1.6.5 and was able to compile pdgstrf.c successfully per your command line options without any errors and not able to reproduce the issue.  If you can attach the preprocessed file (passing -P option) and attach to this issue then I can try and see if I can reproduce the issue. 

Thanks,  
Kittur 

Maxime_B_
Beginner
189 Views

Hi Kittur,

Attached is the pre-processed file.

Also, just to make sure we are compiling the same file, here is the SuperLU_Dist source code that I am trying to compile

http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_3.3.tar.gz

Thanks,

Maxime

Kittur_G_Intel
Employee
189 Views

Thanks Maxime for the attachment, I'll take a look at it. BTW, just to make sure, can you also provide the system info (os, gcc version etc) too?

Thanks,

Kittur

Kittur_G_Intel
Employee
189 Views

Thanks Maxime for the attachment, I'll take a look at it. BTW, just to make sure, can you also provide the system info (os, gcc version etc) too?

Thanks,

Kittur

Kittur_G_Intel
Employee
189 Views

Thanks Maxime for the attachment, I'll take a look at it. BTW, just to make sure, can you also provide the system info (os, gcc version etc) too?

Thanks,

Kittur

Maxime_B_
Beginner
189 Views

Hi Kittur,

This is with CentOS6. GCC was built from source (replacing the default OS compiler) and is version 4.8.1.

Thanks,

Maxime

Kittur_G_Intel
Employee
189 Views

Hi Maxime,

Well, I tried using your .i file as well as the new SU tar file also on RHEL 5,.X as well as 6.2 and couldn't reproduce (See below).

%icc -V

Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008

~/maxime$ ~/intel/openmpi/bin/mpicc -I~/projects/intel/openmpi/include -O3  -xHost -mkl -fPIC -m64 -fPIC -O3  -DAdd_ -DUSE_VENDOR_BLAS -c pdgstrf.i

pdgstrf.c(2672): warning #556: a value of type "int" cannot be assigned to an entity of type "MPI_Request"
            U_diag_blk_send_req[krow] = 1; /* flag outstanding Isend */
                                      ^

pdgstrf.c(2672): warning #152: conversion of nonzero integer to pointer

            U_diag_blk_send_req[krow] = 1; /* flag outstanding Isend */

%ls *.o

pdgstrf.o

=============================

BTW, I don't have a CentOS6 system but RHEL6 is a compatible system for it (we don't officially support CentOS). The only thing is that the gcc version I find on that system is 4.4.  I'll try and see if I can install gcc 4.8 in  the meantime on such a system and try. Other than that, don't know what else we can do since I don't have a CentOS system which is not officially supported.....(we always try to reproduce with compatible EL systems, fyi) You can also try to execute the file witout openMPI and see if you have an issue? If you don't then I am wondering if it's a bug in openMPI? Just a thought....

Regards,

Kittur

Maxime_B_
Beginner
189 Views

Hi Kittur,

I can reproduce it using directly ICC :

/software6/compilers/intel/composer_xe_2013_sp1/bin/icc -O3 -xHost -no-prec-div -Mipa=fast,safe -xHost -fPIC -DDEBUGlevel=0 -DPRNTlevel=1 -DPROFlevel=0 -DAdd_ -fPIC -DUSE_VENDOR_BLAS -c pdgstrf.c -I/software6/mpi/openmpi/1.6.5_intel/include -pthread
icc: command line warning #10006: ignoring unknown option '-Mipa=fast,safe'
error #13002: unexpected CFE message argument:  e. The staggered cosine transform may be
warning #13003: message verification failed for: 556; reverting to internal message
pdgstrf.c(2672): warning #556: a value of type "int" cannot be assigned to an entity of type "MPI_Request"
            U_diag_blk_send_req[krow] = 1; /* flag outstanding Isend */
                                      ^

pdgstrf.c(2672): warning #152: Fatal error: Trigonometric Transform has failed to release the memory.
            U_diag_blk_send_req[krow] = 1; /* flag outstanding Isend */
                                        ^

compilation aborted for pdgstrf.c (code 1)

 

I got the command line from adding --showme to the mpicc command.

Our OpenMPI was compiled with the same version of Intel, using the configure options :

./configure --prefix=$PREFIX \

     --with-threads --with-openib --enable-shared \

     --enable-static --with-ft=cr --enable-ft-thread \

     --with-io-romio-flags="--with-file-system=testfs+ufs+nfs+lustre" --with-tm

and then make && make install.

Kittur_G_Intel
Employee
189 Views

Hi Maxime,

I tried it on a EL6 system with gcc 4.8.1 also and couldn't reproduce:

$/cts/tools/bin/gcc --version
gcc (GCC) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$~/intel/openmpi/bin/mpicc/mpicc -I/home/cmplr/usr4/kganesh1/projects/intel/openmpi/include -O3 -xHost -mkl -fPIC -m64 -fPIC -O3  -DAdd_ -DUSE_VENDOR_BLAS -c pdgstrf.i

pdgstrf.i(11347): warning #556: a value of type "int" cannot be assigned to an entity of type "MPI_Request"
            U_diag_blk_send_req[krow] = 1;
                                      ^

pdgstrf.i(11347): warning #152: conversion of nonzero integer to pointer
            U_diag_blk_send_req[krow] = 1;
                                        ^

------------------------------

So, basically can't reproduce the issue :-(

Regards,

Kittur

Kittur_G_Intel
Employee
189 Views

Interesting, I know RH EL systems are completely compatible with Cent-OS so it's interesting that I am not able to reproduce.
Also, I did build and installed openmpi using icc. Let me see what else might be going on and i'll ping some of my peers to see if they can recognize any further clues on this.
_Regards,
Kittur 

Kittur_G_Intel
Employee
189 Views

Maxime, the only other thing i see is I have left a few options when building openmpi so I'll try that and see if that makes any difference, thanks

Regards, Kittur

Maxime_B_
Beginner
189 Views

Is there anything else I could try on my side ? To get more verbose output, debugging, etc.

Maxime_B_
Beginner
189 Views

Maybe some particularities of our system : we do not have the OS-provided gcc/libstdc++ dating from gcc 4.4. We have instead built GCC 4.8 and its dependencies using GCC 4.4, then uninstalled GCC 4.4, leaving only the glibc (not c++) since this one is required for many more system packages.

The reason we are doing this is because we do not want our users to rely on old versions (at least not without knowing it).

Maxime

Kittur_G_Intel
Employee
189 Views

Maxime, well I reinstalled openmpi and tried again - same scenario, couldn't reproduce the issue.

We have instead built GCC 4.8 and its dependencies using GCC 4.4, then uninstalled GCC 4.4, leaving only the glibc (not c++) 
What you say with reference to could be a factor I am not sure and will need to check with our front-end expert developer.
I'll update you as soon as I get more info. Appreciate your patience till then.


Regards,
Kittur

Maxime_B_
Beginner
189 Views

I can also give you an access to the system I am compiling on if that may help.

Thanks,

Maxime

Kittur_G_Intel
Employee
189 Views

Hi Maxime,

Well, from compiler per-se it appears there's no issue but I've passed this on to the MKL team to find out if there's any issue with MKL and I'll get back to you as soon as I've an update, appreciate much

Regards,
Kittur

Kittur_G_Intel
Employee
189 Views

HI Maxime,

Our front-end team let me know that that diagnostic is coming from the diagnostic infrastructure when grabbing messages from the catalog and verifying the contents. Usually messages like this come from the fact that the compiler is picking up the wrong message catalogs.  The catalogs are picked up via the NLSPATH environment variable.  

So, please check to see if this variable is set, and if so it is set to a known location that matches up with the compiler that is being invoked.

If it is an NLSPATH problem, you can either set it to the proper value or unset it completely and the internal compiler diagnostics will be used instead of the catalog.

Could you please try the above and let me know if it resolves the issue? Appreciate much for your patience and for your quick response.

Regards,
Kittur

Maxime_B_
Beginner
189 Views

Hi Kittur,

The NLSPATH environment variable is not set on our system. What should it be set to ?

Maxime

Kittur_G_Intel
Employee
189 Views

Hi Maxime,

That's strange since somehow the compiler is picking up the wrong message catalog. Could be that you may have not sourced the icc environment file "compilervars.sh".  

Can you do following:

1) Go to the bin directory of where icc is installed and do:

% source compilervars.sh intel64 (if 64bit system)

Now, NLSPATH should be set to where the msg catalogs are and then try compiling....and let me know.

Regards.

Maxime_B_
Beginner
124 Views

Hi Kittur,

I did do this. It does not change anything though :

[mboisson@colosse3 SRC]$ . /software6/compilers/intel/composer_xe_2013_sp1/bin/compilervars.sh intel64

[mboisson@colosse3 SRC]$ env | grep NLS
NLSPATH=/software6/compilers/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/locale/%l_%t/%N:/software6/compilers/intel/composer_xe_2013_sp1.1.106/ipp/lib/intel64/locale/%l_%t/%N:/software6/compilers/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/locale/%l_%t/%N:/software6/compilers/intel/composer_xe_2013_sp1.1.106/debugger/gdb/intel64_mic/py26/share/locale/%l_%t/%N:/software6/compilers/intel/composer_xe_2013_sp1.1.106/debugger/gdb/intel64/py26/share/locale/%l_%t/%N:/software6/compilers/intel/composer_xe_2013_sp1.1.106/debugger/intel64/locale/%l_%t/%N:/software6/compilers/intel/composer_xe_2013_sp1/mkl/lib/intel64/locale/en_US/mkl_msg.cat
[mboisson@colosse3 SRC]$ /software6/compilers/intel/composer_xe_2013_sp1/bin/icc -O3 -xHost -no-prec-div -Mipa=fast,safe -xHost -fPIC -DDEBUGlevel=0 -DPRNTlevel=1 -DPROFlevel=0 -DAdd_ -fPIC -DUSE_VENDOR_BLAS -c pdgstrf.c -I/software6/mpi/openmpi/1.6.5_intel/include -pthread
icc: command line warning #10006: ignoring unknown option '-Mipa=fast,safe'
error #13002: unexpected CFE message argument:  e. The staggered cosine transform may be
warning #13003: message verification failed for: 556; reverting to internal message
pdgstrf.c(2672): warning #556: a value of type "int" cannot be assigned to an entity of type "MPI_Request"
            U_diag_blk_send_req[krow] = 1; /* flag outstanding Isend */
                                      ^

pdgstrf.c(2672): warning #152: Fatal error: Trigonometric Transform has failed to release the memory.
            U_diag_blk_send_req[krow] = 1; /* flag outstanding Isend */
                                        ^

compilation aborted for pdgstrf.c (code 1)
 

Reply