/*convert new_speech to 32s for use with IPP */
ippsConvert_16s32s(new_speech, new_speech32s, L_FRAME + L_FILT);
//16384 = 2^14 so multi by 16384 then left shift is like << 15 times
ippsLShiftC_32s(new_speech32s, 15, ipp_speech_vector1, L_FRAME+L_FILT);
ipp_speech_vector2 = (Ipp32s) st->mem_preemph * (Ipp32s) mu;
//speech_vector2 = new_speech[i-1]*mu for i > 0
ippsMulC_32s_Sfs(new_speech32s, (Ipp32s) mu, ipp_speech_vector2+1, L_FRAME+L_FILT-1, 0);
ippsMulC_32s_ISfs(2, ipp_speech_vector2, L_FRAME+L_FILT, 0);
//subtract speechvector2 from speechvector1 and store in sv1
ippsSub_32s_ISfs(ipp_speech_vector2, ipp_speech_vector1, L_FRAME+L_FILT, 0);
//find max absolute value
ippsMaxAbs_32s(ipp_speech_vector1, L_FRAME+L_FILT, &L_max);
I thought that all of these calls would create SIMD instructions and those use the hardware to it maximum. However, vTune is not reporting this and my application has not sped. Am I under the wrong impression about the IPP libraries? Or could I be missing something?
I am running Fedora 10 64bit with a Quad Core Q9400 running at 2.66 GHz. I used the ippGetCpuFeatures function and it returned 0x5F which means it supports MMX and SSE through 4.1. This is also the same machine that I have gotten stunning performance out of IPP and the SIMD instructiosn before.
I used to compiling with the -g, -lpthread and -i-static flags and was linking against /opt/intel/Compiler/11.1/059/ipp/em64t/sharedlib/libippsem64t.so because I am using IPPS functions. All the functions I am using start with ipps so I assume that is the only library I need to link against. This didn't result in SIMD instructions so I looked for another way. Since I am using the intel compiler I found that I can just put the -ipp flag while using icc and it'll take care of all the library linking for me. I did that, it compiles, but vTune still isn't reporting any SIMD_INSTR_RETIRED events or samples (side note on that...what is the difference between an event and a sample in vTune?).
I would assume that instructions like ippsLShiftC_32s, ippsMulC_32s_Sfs, ippsSub_32s_ISfs shoud all generate SIMD instructions. Right?
Thanks for any help!!
Yes, regardless of the OS, if you use the static library you must specifically initialize the library using the ippInit() function (see this article in the KBIPP Dispatcher Control Functions - ipp*Init*() functions for more info).
If you don't initialize the library dispatcher the PX slice of the library will be used, when compiling for IA-32, which does not use any SSE instructions; or, if you build for an Intel 64 (em64t) system you will get the MX library slice, which does contains some SSE2 instructions, but generally also contains on generic instructions.