IPP initialization issue with .NET

Gregory_C_ · ‎04-19-2013

When I run this code, 7 is written to the console (SSE 2 is enabled). If I run some FFT methods and re-run initialization, I get 3295 (SSE 4.2). What can I do to get consistent results from this API? I am using .NET 4.0 with P/Invoke signatures for all methods used below: class Program { static void Main(string[] args) { IppCpuType cpuType = core.ippGetCpuType(); core.ippInitCpu(cpuType); if (cpuType == IppCpuType.ippCpuAVX) core.ippEnableCpu(cpuType); core.ippInit(); ulong features = core.ippGetEnabledCpuFeatures(); Console.WriteLine(features); } } Thank you, Greg Chernis

SergeyKostrov · ‎04-19-2013

As far as I know there is an issue with ippInit function ( a bug was detected several days ago ). What instruction set do you want to use, SSE2 or SSE4?

SergeyKostrov · ‎04-19-2013

This is a follow up and please take a look at: Forum Topic: Load and Unload issues with Waterfall DLLs ( Instruction Set specific ) Web-link: software.intel.com/en-us/forums/topic/385488

Gregory_C_ · ‎04-22-2013

Hello Sergey,

I have access to Westmere and Sandy Bridge Xeon-based machines. I'd like to use SSE 4.2 or AVX, whichever is available.

Thank you,

-Greg Chernis

Gregory_C_ · ‎04-22-2013

Also, I don't see how the mentioned issue is related to the issue I am having...

SergeyKostrov · ‎04-22-2013

Hi Gregory, >>...If I run some FFT methods and re-run initialization, I get 3295 (SSE 4.2). I checked ippdefs.h header: ... typedef enum { ... ippCpuSSE42 = 0x45, /* Processor supports Streaming SIMD Extensions 4.2 instruction set */ ippCpuAVX = 0x46, /* Processor supports Advanced Vector Extensions instruction set */ ... } IppCpuType; ... and I don't see any code / number that matches to 3295. So, could you explain how did you get it? >>...What can I do to get consistent results from this API? Ideally, I would use initialization ( with ippInit ) at the beginning and would not do re-initialization until all processing is completed. I'd like to understand why do you need to re-initialize IPP libraries after some processing is done?

Gregory_C_ · ‎04-26-2013

IPP Architecture Reference Manual, Volume 1 talks about GetEnabledCpuFeatures() as a method that returns a set of flags, also described in ippcore.h. They are the same flags as in GetCpuFeatures(). 3295 or CDF (Hexadecimal) has the bit for SSE 4.2 set. That's how I know that all is well. It appears that I can get 7 (represents SSE 2 only) right after initialization, but things get better (SSE 4.2) when I run initialization code again.

Thanks for looking at this with me,

-Greg

Gregory_C_ · ‎04-26-2013

I should also mention that the problem is intermittent, though reproducible.

Igor_A_Intel · ‎04-28-2013

Hi Greg,

you should use ippInit() function only, don't use EnableCPU at all - this one has been already deprecated and does nothing. Also it is not clear for me from your code the purpose of calling InitCPU - you are mixing 2 different methods - CpuType (deprecated approach - don't use it) and CpuFeatures. All what you need - (1) call ippInit (2) then call GetCpuFeatures - all other calls in your initialization code are unnecessary

regards, Igor

Gregory_C_ · ‎04-29-2013

Igor,

You're probably looking at a manual different from the one I am inspecting ( Document number A24968-036US). This particular manual does not specify deprecation in the same way as you do.

I am using redistributable DLLs.

If I simply run GetEnabledCpuFeatures() before and after a call to FFTGetSize_C_32fc(), I get 7 (SSE2 enabled) before and hex CDF (SSE 4.2 enabled) after the call. On a Sandy Bridge machine, I get hex FDF (AVX enabled).

Does this look like the correct way to do things?

SergeyKostrov · ‎04-29-2013

>>...If I simply run GetEnabledCpuFeatures() before and after a call to FFTGetSize_C_32fc(), I get 7 (SSE2 enabled) before and >>hex CDF (SSE 4.2 enabled) after the call. On a Sandy Bridge machine, I get hex FDF (AVX enabled)... I wonder if you could execute pure C/C++ tests ( without .NET ) on your computers?

Gregory_C_ · ‎04-30-2013

I surely can run native code, but I prefer .NET code as I will have to inter-operate with native code from a large existing .NET application.

SergeyKostrov · ‎05-01-2013

>>...I surely can run native code, but I prefer .NET code... Gregory, I've asked to do a simple test ( implemented in C/C++ ) if it is possible. I understand that re-implementation of some .NET codes is Not an option. I will also follow up with some advises later.

Gregory_C_ · ‎05-01-2013

It looks like checking for enabled features after a call to GetSize() routines works well. Case closed. Sergey and Igor, thank you for all the help!

SergeyKostrov · ‎05-01-2013

Hi Gregory, >>...It looks like checking for enabled features after a call to GetSize() routines works well... Thanks for the update. Let me do one more post. You should always watch out for CPU Dispatching DLLs ( also known as Waterfall DLLs ) and it is applicable for IPP and MKL libraries. So, if incorrect set of CPU Dispatching DLLs is used it usually affect performance of applications. Here is an example with MKL: [ Test 1 - 64-bit Windows 7 - Default SSE2 DLLs are used ] > Test1153 Start < Sub-Test 1.1 - Runtime binding of MKL functions Dynamic library mkl_rt.dll loaded Initialization Done Sub-Test 1.3 Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130124 for Intel(R) 64 architecture applications Major version : 11 Minor version : 0 Update version : 2 Product status : Product Build : 20130124 Processor optimization: Default processor Sub-Test 3.2 - SGEMM Matrix multiplication C[ 8192x8192 ] = A[ 8192x8192 ] * B[ 8192x8192 ] Allocating memory for matrices ( 32-byte alignment ) Intializing matrix data Measuring performance of SGEMM function Iteration 01 - Completed in 17.847 secs Iteration 02 - Completed in 16.895 secs Iteration 03 - Completed in 16.614 secs Iteration 04 - Completed in 16.661 secs Iteration 05 - Completed in 17.515 secs Deallocating memory Dynamic library mkl_rt.dll unloaded > Test1153 End < [ Test 2 - 64-bit Windows 7 - AVX DLLs are used ] > Test1153 Start < Sub-Test 1.1 - Runtime binding of MKL functions Dynamic library mkl_rt.dll loaded Initialization Done Sub-Test 1.3 Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130124 for Intel(R) 64 architecture applications Major version : 11 Minor version : 0 Update version : 2 Product status : Product Build : 20130124 Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor Sub-Test 3.2 - SGEMM Matrix multiplication C[ 8192x8192 ] = A[ 8192x8192 ] * B[ 8192x8192 ] Allocating memory for matrices ( 32-byte alignment ) Intializing matrix data Measuring performance of SGEMM function Iteration 01 - Completed in 8.237 secs Iteration 02 - Completed in 7.457 secs Iteration 03 - Completed in 7.566 secs Iteration 04 - Completed in 7.488 secs Iteration 05 - Completed in 7.550 secs Deallocating memory Dynamic library mkl_rt.dll unloaded As you can see Test 2 runs almost twice faster (!). Sorry for a test with MKL but it clearly demonstrates how performance is negatively affected.

Gregory_C_ · ‎05-01-2013

When properly initialized, I see near-doubling performance on otherwise similar machines with AVX with IPP!