topic Hi Vineet, in Intel® oneAPI Math Kernel Library

matrix inversion precision issues on two different processors

Vineet_Y_ — Mon, 01 Apr 2013 20:36:23 GMT

I am having very serious precision issues by using intel mkl-lapack for matrix inversion:

Steps:

(1) I inverted a matrix using Matlab/Octave

(2) I use dgetrf and dgetri to invert the same matrix on two processors (a) Intel(R) Core(TM) i7-2600 CPU 3.40GHz for the test code/16GB of RAM on a windows machine using Intel Parallel Composer XE 2013, and (b) Intel(R) Xeon(R) CPU X5660 2.80GHz on a linux machine by using Composer xe 2011

(3) The problem is that the difference between the inverse obtained using Matlab/Octave and by using dgetrf and dgetri is different. There are differences is not an issue but the differences are based on processors is creating problems in large simulations. The answers received by using Intel Xeon Processors and Intel Composer XE 2011 Machine are more accurate than what is obtained by using the same code on windows machine

At this moment I think I am overlooking something i.e. creating a big mistake. An advice on solving this issue would be greatly appreciated. I have attached a sample code to highlight this issue. I have included the sample code but I was not able to upload input binary files on the forum (It was taking long long time)

Many thanks

Vineet

Hi Vineet,

SergeyKostrov — Tue, 02 Apr 2013 00:15:00 GMT

Hi Vineet, >>...The answers received by using Intel Xeon Processors and Intel Composer XE 2011 Machine are more accurate [ SK: On Linux ] >>than what is obtained by using the same code on windows machine... Please post command lines for both cases ( sorry, I don't want to make any suggestions before I see all used options ). Next, I'll be able to verify calculations only on Windows 7 Professional with Intel Parallel Studio XE 2013 Update 2. Also, would you be able to execute a couple of simple C/C++ tests ( I'll provide portable C/C++ codes ) to verify precision control functionality on both systems?

Here are the command lines

Vineet_Y_ — Tue, 02 Apr 2013 03:05:18 GMT

Here are the command lines you requested. Send me the C/C++ files and I will execute them to verify precision control

For linux (Intel Xeon processor)

ifort source1.f90 –heap-arrays -openmp -L /share/apps/intel/composer_xe_2011_sp1.7.256/mkl/lib/intel64/ -I /share/apps/intel/composer_xe_2011_sp1.7.256/mkl/include/ -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 –lpthread –o source1.exe

For Windows

Compiling with Intel(R) Visual Fortran Compiler XE 13.1.0.149 [Intel(R) 64]...

ifort /nologo /debug:full /O2 /I"C:\Program Files (x86)\Intel\Composer XE 2013\mkl\include" /warn:interfaces /module:"x64\Debug\\" /object:"x64\Debug\\" /Fd"x64\Debug\vc100.pdb" /traceback /check:none /libs:static /threads /dbglibs /Qmkl:parallel /c -heap-arrays /Qvc10 /Qlocation,link,"C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\\bin\amd64" "C:\Users\Vineet Work\Documents\Visual Studio 2010\Projects\source1.f90\source1.f90\Source1.f90"

Linking...

Link /OUT:"x64\Debug\source1.f90.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files (x86)\Intel\Composer XE 2013\mkl\lib\intel64" /MANIFEST /MANIFESTFILE:"C:\Users\Vineet Work\Documents\Visual Studio 2010\Projects\source1.f90\source1.f90\x64\Debug\source1.f90.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"C:\Users\Vineet Work\Documents\Visual Studio 2010\Projects\source1.f90\source1.f90\x64\Debug\source1.f90.pdb" /SUBSYSTEM:CONSOLE /IMPLIB:"C:\Users\Vineet Work\Documents\Visual Studio 2010\Projects\source1.f90\source1.f90\x64\Debug\source1.f90.lib" mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib mkl_lapack95_lp64.lib "x64\Debug\Source1.obj" "x64\Debug\Source2.obj"

>>...Send me the C/C++ files

SergeyKostrov — Tue, 02 Apr 2013 05:04:36 GMT

>>...Send me the C/C++ files and I will execute them to verify precision control... Here it is and there are two solutions ( VS 2008 ) for Intel and Microsoft C++ compilers. Note: /Qlong-double /Qpc80 options is used for Intel C++ compiler

Outputs for Reference:

SergeyKostrov — Tue, 02 Apr 2013 05:08:38 GMT

Outputs for Reference: [ Intel C++ compiler ( 16-byte long double data type is used (!) ) ] 32-bit Windows platform - Configuration: RELEASE Test-Case 1 Size of [ long double ] is: 16 Test-Case 2 _CW_DEFAULT & ALLBITSON: 0x9001F _PC_24 & _MCW_PC : 0xA001F _PC_53 & _MCW_PC : 0x9001F _PC_64 & _MCW_PC : 0x8001F Test-Case 3.1 Accuracy _CW_DEFAULT - long double - Result: 1.0000000000079181 Sub-Test 3.2 Accuracy _PC_24 - long double - Result: 1.0090389251708984 Test-Case 3.3 Accuracy _PC_53 - long double - Result: 1.0000000000079181 Test-Case 3.4 Accuracy _PC_64 - long double - Result: 1.0000000000000109 Test-Case 4 Matrix A 101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0 901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0 1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0 2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0 3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0 4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0 4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0 5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0 Matrix B 101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0 901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0 1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0 2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0 3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0 4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0 4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0 5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0 MFPT Used Matrix C - Result 13826808.0 14187608.0 14548408.0 14909208.0 15270008.0 15630808.0 15991608.0 16352408.0 32393208.0 33394008.0 34394808.0 35395608.0 36396408.0 37397208.0 38398008.0 39398808.0 50959608.0 52600408.0 54241208.0 55882008.0 57522808.0 59163608.0 60804408.0 62445208.0 69526008.0 71806808.0 74087608.0 76368408.0 78649208.0 80930008.0 83210808.0 85491608.0 88092408.0 91013208.0 93934008.0 96854808.0 99775608.0 102696408.0 105617208.0 108538008.0 106658808.0 110219608.0 113780408.0 117341208.0 120902008.0 124462800.0 128023600.0 131584400.0 125225200.0 129426000.0 133626800.0 137827600.0 142028400.0 146229200.0 150430000.0 154630800.0 143791600.0 148632400.0 153473200.0 158314000.0 163154800.0 167995600.0 172836400.0 177677200.0 Press ESC to Exit... ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// [ Microsoft C++ compiler ] 32-bit Windows platform - Configuration: RELEASE Test-Case 1 Size of [ long double ] is: 8 Test-Case 2 _CW_DEFAULT & ALLBITSON: 0x9001F _PC_24 & _MCW_PC : 0xA001F _PC_53 & _MCW_PC : 0x9001F _PC_64 & _MCW_PC : 0x8001F Test-Case 3.1 Accuracy _CW_DEFAULT - long double - Result: 1.0000000000079181 Sub-Test 3.2 Accuracy _PC_24 - long double - Result: 1.0090389251708984 Test-Case 3.3 Accuracy _PC_53 - long double - Result: 1.0000000000079181 Test-Case 3.4 Accuracy _PC_64 - long double - Result: 1.0000000000079181 Test-Case 4 Matrix A 101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0 901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0 1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0 2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0 3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0 4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0 4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0 5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0 Matrix B 101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0 901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0 1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0 2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0 3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0 4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0 4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0 5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0 MFPT Used Matrix C - Result 13826808.0 14187608.0 14548408.0 14909208.0 15270008.0 15630808.0 15991608.0 16352408.0 32393208.0 33394008.0 34394808.0 35395608.0 36396408.0 37397208.0 38398008.0 39398808.0 50959608.0 52600408.0 54241208.0 55882008.0 57522808.0 59163608.0 60804408.0 62445208.0 69526008.0 71806808.0 74087608.0 76368408.0 78649208.0 80930008.0 83210808.0 85491608.0 88092408.0 91013208.0 93934008.0 96854808.0 99775608.0 102696408.0 105617208.0 108538008.0 106658808.0 110219608.0 113780408.0 117341208.0 120902008.0 124462808.0 128023608.0 131584408.0 125225208.0 129426008.0 133626808.0 137827616.0 142028416.0 146229216.0 150430016.0 154630816.0 143791616.0 148632416.0 153473216.0 158314016.0 163154816.0 167995616.0 172836416.0 177677216.0 Press ESC to Exit... //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Hi Vineet,

SergeyKostrov — Tue, 02 Apr 2013 23:24:00 GMT

Hi Vineet, Here are a couple of notes and in overall try the same set of command line options for both platforms ( options below are for Windows ): - Use the same Instruction set, for example SSE2 ( /QxSSE2 ), or SSE4.2 ( /QxSSE4.2 ) - Use /fp:precise, /Qprec, /Qpc:64 or /Qpc:80 with /Qlong-double ( it enables 80-bit 'long double' data type when Intel C++ compiler is used ) - OpenMP is used on the Linux platform and I don't see /Qopenmp switch on Windows platform - Verify an OpenMP report with /Qopenmp-report{ 0| 1| 2 } ( it controls the OpenMP parallelizer diagnostic level )