Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Intel IPP 5.3 improvement gains



I downloaded the free evaluation version of "Intel Integrated Performance Primitives v5.3 for Windows* on IA-32 Intel Architecture". I tried the following simple code (in release) in order to check the time consuming in the IPP against a regular loop. I didnt find any improvment in the IPP and I would like to ask why? Thank you.

// The test function:
int test()
clock_t start, end;
int size = 100000000;
float* vec = NULL;
float* vec2 = NULL;
vec=new float[size];
vec2=new float[size];

IppiSize roiSize={size,1};

IppStatus stat;

if(vec==NULL || vec2==NULL) cout << "cannot allocated ";

float val= 3.0;

ippiSet_32f_C1R((Ipp32f) 2.0, (Ipp32f*) vec, size, roiSize);
ippiSet_32f_C1R((Ipp32f) 2.0, (Ipp32f*) vec2, size, roiSize);

start = clock();
for( i=0;i< size ; i++)vec2 = vec*val;
end = clock();
cout << "elapsed time *:" << end - start << endl;

ippiSet_32f_C1R((Ipp32f) 2.0, (Ipp32f*) vec, size, roiSize);
ippiSet_32f_C1R((Ipp32f) 2.0, (Ipp32f*) vec2, size, roiSize);

start = clock();
stat= ippmMul_vc_32f((const Ipp32f*) vec, sizeof(Ipp32f), val,(Ipp32f*) vec2, sizeof(Ipp32f), size);
end = clock();
cout << "elapsed time ipp*:" << end - start << endl;

delete [] vec;
delete [] vec2;
return true;

0 Kudos
1 Reply

First, there is traditional mistake with IPP step parameter. This parameter is number of bytes (distance in bytes) between two adjacent image rows. So, you need to use size * sizeof(Ipp32) when use it as a step parameter.

Second, I would recommend to use ippsMulC_32f functon, from signal processing domain, which is closer to your computation nature, instead of trying to use small matrix operation on such a big vectors.


0 Kudos