I was under the impression that calls to functions made on vector data like the functions in IPPS would get translated into SIMD instructions. I am using a Core 2 Quad CPU Q9400 which I know has SIMD capabilities. I have used the intel libraries before and after profiling with vTune noticed that the application was incorporating SIMD instructions which it wasn't doing before the use IPP. However, after removing a for loop that processed a vector and replaced it with the following code:
/*convert new_speech to 32s for use with IPP */
ippsConvert_16s32s(new_speech, new_speech32s, L_FRAME + L_FILT);
//16384 = 2^14 so multi by 16384 then left shift is like << 15 times
ippsLShiftC_32s(new_speech32s, 15, ipp_speech_vector1, L_FRAME+L_FILT);
ipp_speech_vector2 = (Ipp32s) st->mem_preemph * (Ipp32s) mu;
//speech_vector2 = new_speech[i-1]*mu for i > 0
ippsMulC_32s_Sfs(new_speech32s, (Ipp32s) mu, ipp_speech_vector2+1, L_FRAME+L_FILT-1, 0);
ippsMulC_32s_ISfs(2, ipp_speech_vector2, L_FRAME+L_FILT, 0);
//subtract speechvector2 from speechvector1 and store in sv1
ippsSub_32s_ISfs(ipp_speech_vector2, ipp_speech_vector1, L_FRAME+L_FILT, 0);
//find max absolute value
ippsMaxAbs_32s(ipp_speech_vector1, L_FRAME+L_FILT, &L_max);
I thought that all of these calls would create SIMD instructions and those use the hardware to it maximum. However, vTune is not reporting this and my application has not sped. Am I under the wrong impression about the IPP libraries? Or could I be missing something?
Could you please also test howyou link IPP library and on which kind of processor and OS? maybe because of some reason, you are calling "px" code, so no SIMD instructions are used.
Here is a article talking about the IPP optimized code for your reference.
I am running Fedora 10 64bit with a Quad Core Q9400 running at 2.66 GHz. I used the ippGetCpuFeatures function and it returned 0x5F which means it supports MMX and SSE through 4.1.
I have used IPP before and got stunning performance from the SIMD. I am using it on the same computer and linking in the same manner as before. But the SIMD instructions are not being called.
I am compiling with the -g, -lpthread and -i-static flags (at one time I was using the -static-libgcc flag as well, but not anymore). I am also linking against /opt/intel/Compiler/11.1/059/ipp/em64t/sharedlib/libippsem64t.so because I am using IPPS functions. All the functions I am using start with ipps so I assume that is the only library I need to link against.
I am also calling ippStaticInit() at the very beginning of my program execution. Like I said, I was doing all of this before in my other application and SIMD instructions were being called. I don't think I am missing anything that I did then.
Is this all the information you need? I don't know why SIMD instructions are not being called. I appreciate all of your help!
I have attached my Makefile if that helps
I would assume that instructions like ippsLShiftC_32s, ippsMulC_32s_Sfs, ippsSub_32s_ISfs shoud all generate SIMD instructions. Right?
Thanks again for any help!
Did you used -axSSE4.1 option while compiling, as mentioned in the compiler document this option can generate Intel SSE4 Vectorizing Compiler and Media Accelerator instructions for Intel processors. Alsogenerate Intel SSSE3, SSE3, SSE2, and SSE instructions and it can optimize for Intel 45nm Hi-k next generation Intel Core microarchitecture.
Another good article by Dr. Sam Siewert about Using Intel Streaming SIMD Extensions and Intel Integrated Performance Primitives to Accelerate Algorithms.
Good example and explanation is in the Using the Intel C/C++ Compiler and Intel IPP Tools
Yes, if you call some IPP functions, they shoud generate SIMD instruction. For example, I try ippsSub_32s_ISfs on my labtop, Core 2 machine, which will call "v8" code,
vtune will reports the SIMD are used.
#define LEN 5000
for(int n=0; n
for( int i = 0; i <10000; i++ )
ippsSubC_32s_ISfs( mu,pSrc, LEN,0); //);//
Here is the Vtune screencopy.
About the sampling and event, simply words (not exact), you can take sampling is sampling action and the event is theitem you hope to sample. For example, you want to see if SIMD instruction are used. then you select the event SIMD_INST_RETIRED, and based on it, you will get sample data (how many SIMD INST are used). You may search the exact information from Vtune KB or forum.