Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

## MKL speed performance vs. IPP

Beginner
937 Views
I have noticed that a number of the vector routines in IPP are significantly faster than their counterparts in MKL. For example on a core i7 processor I have noted that the ippsMul_32fc() function from IPP is about 45% faster than the vcMul() routine from MKL.

Is there a reasonable explanation for this?

BTW: I am comparing MKL 10.2.2 and IPP 6.1.2

-Simon
8 Replies
Honored Contributor III
937 Views
A more definite example might be required. For example, you probably are calling IPP by C interface and getting faster results (at least for short vectors) than you could get by a Fortran-compatible call to MKL, certainly faster than a cblas wrapper.
Moderator
937 Views
Simon.
the main question in such cases are - the input size?
Beginner
937 Views
Hello,

I have a very similar question regarding the performance of the eigenvalue and eigenvector calculation (IPPM vs. MKL). Which library is recommended for an input size of the matrix between 20 and 40?

Thank you and best regards,
Tom
Moderator
937 Views
Tom,
for similar sizes I would recommend to use IPP (non-threaded version) first of all.
Beginner
937 Views

Tom
Beginner
937 Views
The input size in my example is 32768 complex elements. Here is an example of how my code looks.

=======================================================================

complex x1[32768];
complex x2[32768];
complex y1[32768];
complex y2[32768];

// Fill the x1 and x2 arrays with random data, Uniformly distributed on [-10000,10000].
// This is similar to the MKL benchmark test vectors.
// Also note that I have a version of rand() that returns a unfirom RV on [0,1].

for(int ii=0; ii<32768; ii++)
{
r1I = -10000 + 20000 * rand();
r1Q = -10000 + 20000 * rand();
r2I = -10000 + 20000 * rand();
r2Q = -10000 + 20000 * rand();
x1[ii] = complex(r1I,r1Q);
x2[ii] = complex(r2I,r2Q);
}

// Now make the call to IPP
// The first call "warms" up the cache. Time the second call
ippsMul_32fc((Ipp32fc *)(&x1[0]), (Ipp32fc *)(&x2[0]), (Ipp32fc *)(&y[0]), 32768);

ippsMul_32fc((Ipp32fc *)(&x1[0]), (Ipp32fc *)(&x2[0]), (Ipp32fc *)(&y[0]), 32768);

//
// Now repeat the above for the MKL vector multiply routine
//
vcMul(32768,x1,x2,y);

vcMul(32768,x1,x2,y);

=======================================================================

The above code in put in a main() routine. The compilation takes the form:

icpc test.cpp -O3 -L/opt/intel/ipp/6.1.2.051/ia32/sharedlib -lipps -lippcore ...

-Simon
Employee
937 Views
Simon,

MKL function vcMul() gives more accurate result by default than ippsMul_32fc().

In order to enable less accurate but faster function in MKL you can call vmlSetMode(VML_EP) before the call to vcMul().

In order to use more accurate functions in IPP you can check Fixed-Accuracy Arithmetic Functions domain. ippsMul_32fc_A24() will be as accurate as vcMul() in VML_HA (default) mode.

- Ilya

Beginner
937 Views
Ilya,

Excellent! Thanks for the explanation. I over looked that difference.

-Simon