I am using IPP libraries with dispatching and dynamic linking with a Xeon 5450 in linux (intel 64 package). Although the dispatcher should detect and use the appropriate library, I noticed when profiling my code (with vtune) that the libraries being used have the y8 extension. According to the getting started guide, this extension is for Intel Core 2 Duo processors. The guide mentioned u8 being for the Xeon 5100 family. However, this extension would not be appropriate for me either. A such, I was wondering which extension should be normally chosen by the dispatcher for a Xeon 5450 (maybe there exists a new set of libraries with a new extension)and if thedispatcher does not select it can I specify it manually when compiling as a flags (i.e. -l...) ?
I am asking this because my code does not run much faster when using the IPP libraries so far. When profiling my code I notice that the functions taking up the most time are the loops in the libippsy8.so library (used for ipps_crosscorr) along with a function called _kmp_wait_sleep from the libiomp5.so library (if anyone can tell me if this function is called from any ipp functions I would really appreciate it). I have no I/O in the main part of my program so I don't know why this function would be called.
according Intel web site, Xeon X5450 is Core 2 processor (codenamed Harpertown), so I would expect IPP to dispatch u8 libraries for 64-bit environment and v8 for 32-bit environment.
You may check what IPP libraries are dispatched on your system by running IPP demo application (take a look at Help -> About dialog where it should report on libraries dispatched to run)
Seeing as my dispatcher is choosing the y8 libraries, I forced the u8 libraries by compiling with (-L /opt/intel/Compiler/11.0/083/ipp/em64t/sharedlib -lippsu8 -lippmu8 -L /opt/intel/Compiler/11.0/083/lib/intel64 -liomp5 -lm -lpthread) instead of (-L /opt/intel/Compiler/11.0/083/ipp/em64t/sharedlib -lippsem64t -lippmem64t -lippcoreem64t -L /opt/intel/Compiler/11.0/083/lib/intel64 -liomp5 -lm -lpthread). Yet I am not seeing any improvement in performance. Should I include any additional libraries. I am starting to think that something is wrong with the libiomp5 lib as my profiler is spending some time in the sleep_wait function which belongs to that library. Any other suggestions ?