Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Performance problem with IPP 5.1 vs. 4.0

bdirector
Beginner
467 Views
My predecessor used IPP 4.0 on an imaging application and he chose to have the app run on one processor (P4 with SSE3) using static libs with no dispatching. I want to run the app on a dual core CPU so I purchased 5.1. I decided to use the static lib with dispatching model so the app would run on different processors. After a small amount of work I got it to compile. When I ran the program it was running about 3x slower than the version using 4.0 on the same single processor PC. The timing info is from timers in the app. We measure the time it takes to convert raw data to an image.

In going over the project settings, I saw that, for the 4.0 version, both the E-merged and merged libraries were included. He did include the header ipp_t7.h and did not make a call to ippStaticInitBest(). The changes I made were to replace the ipp_t7.h include with ippcore.h, add a call to ippStaticInit(), add ippcorel.lib, and replace a few function calls (e.g . ippmDotProduct_vava_32f_4x1 with ippmDotProduct_vava_32f ) and add additional arguments to function calls. The 5.1 version of the app runs correctly, i.e. I'm getting the images I expect.

The questions I have are: any idea why the 5.1 version of the app runs so slow compared to the 4.0 version? Am I not doing the correct steps to use static libs with dispatching? Or (grasping at straws) does including the E-merged libs when you link without dispatching somehow speed up the IPP functions?



0 Kudos
3 Replies
Vladimir_Dudnik
Employee
467 Views

Hi,

it was limitation of cpu dispatcher code in IPP 4.0. At the development time of IPP 4.0 we did not know what processors can be available through several years and so dispatcher choose PX code. If you will initialize librray with ippStaticInitCpu, you can directly control what optimized code to launch.

In IPP 5.2 this limitation will be removed, and dispatecher will choose the best optimized code from previous generation (of course with checking that the instruction set still available)

For application, it always useful to output IPP version, to see what processor specific code was choosen by dispatcher. You can do that with simple code like this (you can replace ippj library with any IPP library you use):

 const IppLibraryVersion* ippj = ippjGetLibVersion();
 printf("Intel Integrated Performance Primitives
");
printf(" version: %s, [%d.%d.%d.%d] ",
ippj->Version, ippj->major, ippj->minor, ippj->build, ippj->majorBuild);
printf(" name: %s ", ippj->Name);
printf(" date: %s ", ippj->BuildDate);

Regards,
Vladimir

0 Kudos
bdirector
Beginner
467 Views
Using ippStaticInitCpu did the trick. Thanks.

I called ippGetCpuType and used the return value as input to ippStaticInitCpu. ippGetCpuType returned ippCpuEM64T instead of ippCpuP4HT2, which is what I was expecting. But the call to ippmGetLibVersion returned t7 as the targetCpu. I'm curious, why did GetCpuType return what it did? As I said in the original post, I'm using a single core P4.
0 Kudos
Vladimir_Dudnik
Employee
467 Views

Hi,

Actually, single call ippStaticInit() should work. You do not need to detect cpu type and call ippStaticInitCpu() in usual application. Of course it will work in that way too. But the reason of this function to provide you a way to change default behaviour of IPP dispatcher or compare performance of different IPP optimized codes.

IPP dispatcher can return ippCpuEM64T on processors which have support Extended Memory 64-bit Technology, that's ok.

If you can build JPEGView sample and see what information about IPP will be in Help->About dialog it can help to understand what code was choosen IPP dispatcher by default.

Regards,
Vladimir

0 Kudos
Reply