Showing results for 
Search instead for 
Did you mean: 

SIMD instruction not being created from IPPS

I was under the impression that calls to functions made on vector data like the functions in IPPS would get translated into SIMD instructions. My previous uses of IPP have always resulted in vTune reporting the use of SIMD instructions. However, my current use of IPP isn't resulting in SIMD instructions. The following code sample is IPP code that I replacted instead of a for-loop that originally processed the vector.

/*convert new_speech to 32s for use with IPP */
ippsConvert_16s32s(new_speech, new_speech32s, L_FRAME + L_FILT);
//16384 = 2^14 so multi by 16384 then left shift is like << 15 times
ippsLShiftC_32s(new_speech32s, 15, ipp_speech_vector1, L_FRAME+L_FILT);
ipp_speech_vector2[0] = (Ipp32s) st->mem_preemph * (Ipp32s) mu;
//speech_vector2 = new_speech[i-1]*mu for i > 0
ippsMulC_32s_Sfs(new_speech32s, (Ipp32s) mu, ipp_speech_vector2+1, L_FRAME+L_FILT-1, 0);
ippsMulC_32s_ISfs(2, ipp_speech_vector2, L_FRAME+L_FILT, 0);
//subtract speechvector2 from speechvector1 and store in sv1
ippsSub_32s_ISfs(ipp_speech_vector2, ipp_speech_vector1, L_FRAME+L_FILT, 0);
//find max absolute value
ippsMaxAbs_32s(ipp_speech_vector1, L_FRAME+L_FILT, &L_max);

I thought that all of these calls would create SIMD instructions and those use the hardware to it maximum. However, vTune is not reporting this and my application has not sped. Am I under the wrong impression about the IPP libraries? Or could I be missing something?

I am running Fedora 10 64bit with a Quad Core Q9400 running at 2.66 GHz. I used the ippGetCpuFeatures function and it returned 0x5F which means it supports MMX and SSE through 4.1. This is also the same machine that I have gotten stunning performance out of IPP and the SIMD instructiosn before.

I used to compiling with the -g, -lpthread and -i-static flags and was linking against /opt/intel/Compiler/11.1/059/ipp/em64t/sharedlib/ because I am using IPPS functions. All the functions I am using start with ipps so I assume that is the only library I need to link against. This didn't result in SIMD instructions so I looked for another way. Since I am using the intel compiler I found that I can just put the -ipp flag while using icc and it'll take care of all the library linking for me. I did that, it compiles, but vTune still isn't reporting any SIMD_INSTR_RETIRED events or samples (side note on that...what is the difference between an event and a sample in vTune?).

I would assume that instructions like ippsLShiftC_32s, ippsMulC_32s_Sfs, ippsSub_32s_ISfs shoud all generate SIMD instructions. Right?

Thanks for any help!!
0 Kudos
2 Replies
New Contributor III

I use IPP under Windows, so I hope this applies. When you use the -static flag, static versions of libraries are used correct? If this is so, then you need to call ippInit() in your application so that IPP knows which processor architecture you're running on - this is not necessary with the dynamically linked IPP. If you don't call ippInit(), the least optimized (most compatible) version of the functions will be called.



Yes, regardless of the OS, if you use the static library you must specifically initialize the library using the ippInit() function (see this article in the KBIPP Dispatcher Control Functions - ipp*Init*() functions for more info).

If you don't initialize the library dispatcher the PX slice of the library will be used, when compiling for IA-32, which does not use any SSE instructions; or, if you build for an Intel 64 (em64t) system you will get the MX library slice, which does contains some SSE2 instructions, but generally also contains on generic instructions.