Solved: NaN values in INTEL Fortran Example Using RCI FGMRES Solver with mkl-10.1

miramin · ‎05-27-2010

Hi there,

I am trying to get the hang of the one of the MKL Iterative solvers: RCI FGMRES and I haven't been able to generate correct results of the INTEL's own example which is on page :

http://www.intel.com/software/products/mkl/docs/WebHelp/appendices/mkl_appC_ISS.html#mkl_appC_ISS

under title :

" Fortran Example of Using RCI (Preconditioned) Flexible Generalized Minimal Residual Solver.

Fortran example results for a non-symmetric indefinite system. Upon successful execution of the solver, the following result is printed (up to rounding errors that depend on the computer system used): "

I am using thie intel compiler : intel-fc-10.1 and MKL library version 10.1
I changed the INTEL example to FORTRAN 90 using integer (KIND = 8) and REAL(KIND = 8) instead of F77 notations of integer and Double Precesion.

I have been successful to compile this code with the follwoing command :

ifort RCIFGMRES_INTEL.f90 -g -o ISS.exe -I /usr/local/compilers/Intel/mkl-10.1/include -L$MKL_HOME/lib/em64t -lmkl -lmkl_lapack -lguide -Wl,--start-group $MKL_HOME/lib/em64/libmkl_solver_ilp64_sequential.a $MKL_HOME/lib/em64t/libmkl_intel_ilp64.a $MKL_HOME/lib/em64t/libmkl_solver_ilp64_sequential.a $MKL_HOME/lib/em64t/libmkl_core.a -lpthread -Wl,--end-group

No error in compiling!

When I run the program, my output is the following :

[smirsa1@philip1 ISS]$ ./ISS.exe
--------------------------------------------------
The SIMPLEST example of usage of RCI FGMRES solver
to solve a non-symmetric indefinite non-degenerate
algebraic system of linear equations
--------------------------------------------------

The system has been solved

The following solution has been obtained:
COMPUTED_SOLUTION(1)= NaN
COMPUTED_SOLUTION(2)= -Infinity
COMPUTED_SOLUTION(3)= NaN
COMPUTED_SOLUTION(4)= Infinity
COMPUTED_SOLUTION(5)= NaN

The expected solution is:
EXPECTED_SOLUTION(1)=-0.100E+01
EXPECTED_SOLUTION(2)= 0.100E+01
EXPECTED_SOLUTION(3)= 0.000E+00
EXPECTED_SOLUTION(4)= 0.100E+01
EXPECTED_SOLUTION(5)=-0.100E+01

Number of iterations: 1
This example may have FAILED as either the number of iterations differs from the expected number of iterations 5 or the computed solution differs much from the expected solution (Euclidean norm is NaN ), or both.
1
[smirsa1@philip1 ISS]$

Have u seen this behavior before? I can't figure out what it is I am doing wrong. I would be very grateful if somebody could tell me where this error is coming from.

I hav attached my source code as well to this message.

Thanks for reading! :)

mecej4 · ‎05-29-2010

Well, I am stumped. Typically, we run into problems with sample code from manuals (such as the MKL manual) as part of the "initial acceptance test". If there are problems with paths, prerequisite software, OS/hardware incompatibilities, we find out because the example problems don't run correctly. But, you seem to have everything well set.

There are a couple of things to try. These will take time, and whether to do so or not is up to you decide.

0. Seek out another machine with a similar software set up, and see if you can reproduce the issue.

1. Compile with -g as an additional compiler flag, then run. If the resulting code again says Aborted then we are in business. Run under a debugger, such as gdb. Run with the command gdb ./a.out, then enter the gdb command r. After the Aborted message is printed, enter bt full. The resulting back-trace may help find out where the problem lies.

2. Similarly, you may use a symbolic debugger such as idb, with comes with IFort or ddd, which is a symbolic front end to gdb, to find the location in your code where the abort occurs.

3. The old standby is to narrow down the location of the error by inserting WRITE statements.

I regret that I have no better suggestions.

View solution in original post

mecej4 · ‎05-27-2010

Replace all instances of INTEGER(KIND=8) by INTEGER. With these changes, with the current versions of IFORT and MKL, I obtained

...
...
Number of iterations: 5
This example has successfully PASSED through all steps of computation!

Unless the documentation for the MKL library explicitly specifies INTEGER(KIND=8) for a subroutine argument, use the default kind of integers.

Please note, as well, that specifying explicit numbers after KIND= makes your code non portable. Use the built-in functions SELECTED_KIND, etc., instead.

miramin · ‎05-27-2010

Quoting mecej4

Replace all instances of INTEGER(KIND=8) by INTEGER. With these changes, with the current versions of IFORT and MKL, I obtained

...
...
Number of iterations: 5
This example has successfully PASSED through all steps of computation!

Unless the documentation for the MKL library explicitly specifies INTEGER(KIND=8) for a subroutine argument, use the default kind of integers.

Please note, as well, that specifying explicit numbers after KIND= makes your code non portable. Use the built-in functions SELECTED_KIND, etc., instead.

Thank you mecej4 for your kind help. But it does not work for me. I did exactly as u said and I have the same NaN answers.

What do u mean by "current versions of IFORT and MKL ?"
What is your version of intel compiler and MKL library ?

Thanks alot.
Amin

Gennady_F_Intel · ‎05-27-2010

Amin,

The latest versions of intel compilers ( both, C/C++ and Fortran) and MKL you can find here.

another words - the currents ( the latest ) version are: MKL 10.2 Update5 and Intel Compiler v.11.1

--Gennady

mecej4 · ‎05-28-2010

I took the "Intel MKL RCI (P)CG Fortran-77 example" straight from the MKL manual, and ran it on another system
Linux 2.6.18-194.el5 #1 SMP Tue Mar 16 21:57:01 EDT 2010 ia64 ia64 ia64 GNU/Linux

using V10.1 of the Intel compiler and V10.1 of MKL:

without making any changes to the source code.

ifort -I/opt/intel/mkl101/include/ rci.F -L$MKLPATH $MKLPATH/libmkl_solver_lp64_sequential.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -lpthread

and obtained

[mecej4@login1 LANG]$ ./a.out
The system has been solved
The following solution obtained
1.000 0.000 1.000 0.000
1.000 0.000 1.000 0.000
expected solution
1.000 0.000 1.000 0.000
1.000 0.000 1.000 0.000
Number of iterations: 8
This example has successfully PASSED through all steps of computation!
0

miramin · ‎05-28-2010

Quoting mecej4

I took the "Intel MKL RCI (P)CG Fortran-77 example" straight from the MKL manual, and ran it on another system
Linux 2.6.18-194.el5 #1 SMP Tue Mar 16 21:57:01 EDT 2010 ia64 ia64 ia64 GNU/Linux

using V10.1 of the Intel compiler and V10.1 of MKL:

without making any changes to the source code.

ifort -I/opt/intel/mkl101/include/ rci.F -L$MKLPATH $MKLPATH/libmkl_solver_lp64_sequential.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -lpthread

and obtained

[mecej4@login1 LANG]$ ./a.out
The system has been solved
The following solution obtained
1.000 0.000 1.000 0.000
1.000 0.000 1.000 0.000
expected solution
1.000 0.000 1.000 0.000
1.000 0.000 1.000 0.000
Number of iterations: 8
This example has successfully PASSED through all steps of computation!
0

Dear Mecej4,

U are solving the example of Conjugate Gradient solver not the

Example. Fortran Example to Solve Non-Symmetric Indefinite System with Intel MKL RCI (P)FGMRES ((Preconditioned) Flexible Generalized Minimal RESidual method) .

BTW, I copied the whole example of FGMR example [F77] and was able to compile it on my system with the folllowing command :

ifort exm.f -g -o I.exe -I /usr/local/compilers/Intel/mkl-10.1/include -L$MKL_HOME/lib/em64t -lmkl -lmkl_lapack -lguide -Wl,--start-group $MKL_HOME/lib/em64t/libmkl_solver_ilp64_sequential.a $MKL_HOME/lib/em64t/libmkl_intel_ilp64.a $MKL_HOME/lib/em64t/libmkl_core.a -lpthread -lmkl -lmkl_lapack -lguide -Wl,--end-group -lpthread

When I run it , it is aborted though!

[smirsa1@philip1 ISS]$ ./I.exe
--------------------------------------------------
The SIMPLEST example of usage of RCI FGMRES solver
to solve a non-symmetric indefinite non-degenerate
algebraic system of linear equations
--------------------------------------------------
Aborted
[smirsa1@philip1 ISS]$

My comiler version is : 10.1 and MKL version is 10.1

Maybe I could use a debgger to figure out what is going wrong but I have a feeling it's because of the calls to the MKL library.

mecej4 · ‎05-28-2010

I downloaded the fixed format file from the first URL that you posted above. I also downloaded the free format file that you attached above, changed INTEGER*8 to INTEGER and ran this program, and I ran both with two versions of IFORT and MKL: (i) current versions (11.1 of IFORT, 10.2 of MKL) on Linux-x64, and (ii) older versions (10.1 of both) on Linux-IA64.

All four runs gave me correct results.

Are you processing the Fortran sources correctly? It occurred to me that perhaps you are not aware that on Linux file names are case-sensitive, and that to make preprocessing happen you have to give your source files .F (for fixed format) and .F90 (for free format) rather than .f and .f90? It is also possible to use the -fpp compiler switch for the same purpose.

What OS are you running all this on?

miramin · ‎05-28-2010

Quoting mecej4

I downloaded the fixed format file from the first URL that you posted above. I also downloaded the free format file that you attached above, changed INTEGER*8 to INTEGER and ran this program, and I ran both with two versions of IFORT and MKL: (i) current versions (11.1 of IFORT, 10.2 of MKL) on Linux-x64, and (ii) older versions (10.1 of both) on Linux-IA64.

All four runs gave me correct results.

Are you processing the Fortran sources correctly? It occurred to me that perhaps you are not aware that on Linux file names are case-sensitive, and that to make preprocessing happen you have to give your source files .F (for fixed format) and .F90 (for free format) rather than .f and .f90? It is also possible to use the -fpp compiler switch for the same purpose.

What OS are you running all this on?

Mecej4,

My OS is :

Linux philip1 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

And I changed the name of my modified code to *.F90 and the INTEL original example code to *.F. I could comipel both of them with no problem but they both were aborted. In the free-format file I had changed my INTEGER*8 to INTEGER.

This is strange error. I also used the flag -fpp . I can compile but again my code is aborted. I am not sure what is it I am doing wrong.

mecej4 · ‎05-29-2010

Well, I am stumped. Typically, we run into problems with sample code from manuals (such as the MKL manual) as part of the "initial acceptance test". If there are problems with paths, prerequisite software, OS/hardware incompatibilities, we find out because the example problems don't run correctly. But, you seem to have everything well set.

There are a couple of things to try. These will take time, and whether to do so or not is up to you decide.

0. Seek out another machine with a similar software set up, and see if you can reproduce the issue.

1. Compile with -g as an additional compiler flag, then run. If the resulting code again says Aborted then we are in business. Run under a debugger, such as gdb. Run with the command gdb ./a.out, then enter the gdb command r. After the Aborted message is printed, enter bt full. The resulting back-trace may help find out where the problem lies.

2. Similarly, you may use a symbolic debugger such as idb, with comes with IFort or ddd, which is a symbolic front end to gdb, to find the location in your code where the abort occurs.

3. The old standby is to narrow down the location of the error by inserting WRITE statements.

I regret that I have no better suggestions.

miramin · ‎05-31-2010

Dear mecej4,

I could run the INTEL exmaple code successfully at last. The admin of the system I am using installed the latest version of MKL (10.2) and so with changing the INTEGER*8 to INTEGER and also changing the name fo the file from *.f90 to *.F90 I could fix my problem and everything seems to be working now.

So I think my problem was that I was using the latest version of INTEL compiler and version 10.1 of MKL together. Now that I am using the latest INTEL compiler and the MKL library (10.2) everything seems to be working.

Thank you for your input and help, I am really grateful to u.

Cheers,
Amin