Fast inversion of 6x6 matrix

Michal_Kvasnicka · ‎03-21-2012

I am looking for fast and robust C/C++ code for 6x6 matrix (double precision, general matrix) inversion optimized for current INTEL Xeon CPUs.

I found the following Intel document (http://www.intel.com/design/pentiumiii/sml/245044.htm), but I have no idea how to recompile this code under C/C++ Intel compiler 12.x.

I tried to use IPP matrix inversion function too, but the matrix is probably near singular, and this function gives very often wrong results.

Is there any highly optimized and robust 6x6 matrix inversion function for Intel compiler?

TimP · ‎03-21-2012

You appear to have a contradiction here. If your data require full pivoting for reliable solution, you aren't likely to be able to optimize better than a competent compiler can do. You will need first to find and test a reliable implementation, then do what is possible e.g. with #pragma vector aligned to get it vectorized. If you specialize with fixed dimensions at compile time, you should get better optimization than you would in a variable sized case.

Michal_Kvasnicka · ‎03-21-2012

OK ... I need to know now, what is the proper way to recompile C/C++ SIMD enhanced code mentioned in the document (http://www.intel.com/design/pentiumiii/sml/245044.htm)by Intel C++ compiler 12.x

SergeyKostrov · ‎03-23-2012

Quoting Michal Kvasnicka

OK ... I need to know now, what is the proper way to recompile C/C++ SIMD enhanced code mentioned in the document (http://www.intel.com/design/pentiumiii/sml/245044.htm)by Intel C++ compiler 12.x

Hi,

I'd like to ask a question: Did you have any compilation issues or problems? If Yes, could you provide some details?

Best regards,
Sergey

Michal_Kvasnicka · ‎03-23-2012

Just open a document (http://www.intel.com/design/pentiumiii/sml/245044.htm) and try to recompile the code on section 5.2 C Code in General Case with Streaming SIMD Extensions (page 10).

Obviously, a lot of header files are missing + initializations, etc. ... so finaly the code in this form is not possible to compile by current Intel C++ compiler. There is no information about proper compiler switches, options, linking libraries, etc. ... nothing!

Michal

SergeyKostrov · ‎03-25-2012

Quoting Michal Kvasnicka

...Obviously, a lot of header files are missing + initializations, etc. ... so finaly the code in this form is not possible to compile by current Intel C++ compiler. There is no information about proper compiler switches, options, linking libraries, etc. ... nothing!..

If youinclude the following headers the codes will be compiled:

[bash] ... #include #include #include #include ... [/bash]

SergeyKostrov · ‎03-25-2012

Quoting Michal Kvasnicka

Just open a document (http://www.intel.com/design/pentiumiii/sml/245044.htm) and try to recompile the code on section 5.2 C Code in General Case with Streaming SIMD Extensions (page 10)...

I compiled and tested that function with a couple of modifications:

[cpp]#define EPSILON ( 1.19209289e-07f ) // Single-Precision ( 23-bit ) ... void PIII_Invert_6x6( float *src ); void PIII_Invert_6x6( float *src ) { ... __m128 tmp1 = { 0.0f }, tmp2 = { 0.0f }, tmp3 = { 0.0f }; ... } [/cpp]
Here are results of my test case:

[cpp] // Matrix A float fMatrix[6][6] = { { 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f }, { 0.0f, 2.0f, 0.0f, 0.0f, 0.0f, 0.0f }, { 0.0f, 0.0f, 3.0f, 0.0f, 0.0f, 0.0f }, { 0.0f, 0.0f, 0.0f, 4.0f, 0.0f, 0.0f }, { 0.0f, 0.0f, 0.0f, 0.0f, 5.0f, 0.0f }, { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 6.0f }, }; PIII_Invert_6x6( &fMatrix[0][0] ); // Inverse of Matrix A // 0.9999999403953 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 0.4999999701976 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 0.3333333134651 0.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 0.0000000000000 0.2499999850988 0.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.1999999880790 0.0000000000000 // 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.1666666567325 DisplayMatrixF( &fMatrix[0][0], 6, 6, RTtrue ); // Verification ( Inv(A) * A = I ) // 0.9999999403953 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 0.9999999403953 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 0.9999999403953 0.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 0.0000000000000 0.9999999403953 0.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.9999999403953 0.0000000000000 // 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.9999999403953 // If a rounding applied it is an Identity matrix // 1.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 1.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 1.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 0.0000000000000 1.0000000000000 0.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 1.0000000000000 0.0000000000000 // 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 0.0000000000000 1.0000000000000 [/cpp]Best regards,
Sergey

Michal_Kvasnicka · ‎03-25-2012

Sergey ... thank you very much!!!

So, the last question is: Is possible to tranform this code from single to double precision? I am not sure if the SIMD extensions support double precision or not.

Do you know and could you recomend me any other C/C++ implementation of highly optimized 6x6 matrix (general, double precision) inversion code. The 6x6 inversion matrix function plays crucial role in our simulation code. I am looking for any suitable option on this topic to deploy maximum computing power provided by current Xeon CPUs.

Michal

SergeyKostrov · ‎03-26-2012

Quoting Michal Kvasnicka

Sergey ... thank you very much!!!

So, the last question is: Is possible to tranform this code from single to double precision? I am not sure if the SIMD extensions
support double precision or not.

[SergeyK] I think it is possible.

Do you know and could you recomend me any other C/C++ implementation of highly optimized 6x6 matrix (general, double precision)
inversion code.

[SergeyK] Intel IPP library has a set of highly optimized functions in a Matrix Processing domain ( ippm ).
Another option to consideris aGNU Scientific Library ( GSL ).

SergeyKostrov · ‎03-26-2012

Quoting Michal Kvasnicka

Sergey ... thank you very much!!!

So, the last question is: Is possible to tranform this code from single to double precision?..

Yes. Take into account thatcode changesfor the2nd and 3rd functions will be significant. In a couple of words,
you will need to change a basic type from '__m128' to '__m128d'( only 2 double-precision values ) and all
the rest 200 - 300 code lines...

I wouldmodifythe 1st C-like function ( 'double' type instead of 'float') and than I would try to optimize it as better as possible.

Best regards,
Sergey

SergeyKostrov · ‎03-26-2012

For a double-precision type an epsilon is:

[cpp]#define EPSILON ( 2.2204460492503131e-016L ) // Double-Precision ( 64-bit )
[/cpp]

fast inversion of 6x6 matrix