Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL speed performance vs. IPP

sgwood
Beginner
717 Views
I have noticed that a number of the vector routines in IPP are significantly faster than their counterparts in MKL. For example on a core i7 processor I have noted that the ippsMul_32fc() function from IPP is about 45% faster than the vcMul() routine from MKL.

Is there a reasonable explanation for this?

BTW: I am comparing MKL 10.2.2 and IPP 6.1.2

-Simon
0 Kudos
8 Replies
TimP
Honored Contributor III
717 Views
A more definite example might be required. For example, you probably are calling IPP by C interface and getting faster results (at least for short vectors) than you could get by a Fortran-compatible call to MKL, certainly faster than a cblas wrapper.
0 Kudos
Gennady_F_Intel
Moderator
717 Views
Simon.
the main question in such cases are - the input size?
--Gennady
0 Kudos
Thomas_B_3
Beginner
717 Views
Hello,

I have a very similar question regarding the performance of the eigenvalue and eigenvector calculation (IPPM vs. MKL). Which library is recommended for an input size of the matrix between 20 and 40?

Thank you and best regards,
Tom
0 Kudos
Gennady_F_Intel
Moderator
717 Views
Tom,
for similar sizes I would recommend to use IPP (non-threaded version) first of all.
--Gennady
0 Kudos
Thomas_B_3
Beginner
717 Views
Gennady,

thank you for your quick reply.

Tom
0 Kudos
sgwood
Beginner
717 Views
Gennady,
The input size in my example is 32768 complex elements. Here is an example of how my code looks.

=======================================================================

complex x1[32768];
complex x2[32768];
complex y1[32768];
complex y2[32768];

// Fill the x1 and x2 arrays with random data, Uniformly distributed on [-10000,10000].
// This is similar to the MKL benchmark test vectors.
// Also note that I have a version of rand() that returns a unfirom RV on [0,1].

for(int ii=0; ii<32768; ii++)
{
r1I = -10000 + 20000 * rand();
r1Q = -10000 + 20000 * rand();
r2I = -10000 + 20000 * rand();
r2Q = -10000 + 20000 * rand();
x1[ii] = complex(r1I,r1Q);
x2[ii] = complex(r2I,r2Q);
}

// Now make the call to IPP
// The first call "warms" up the cache. Time the second call
ippsMul_32fc((Ipp32fc *)(&x1[0]), (Ipp32fc *)(&x2[0]), (Ipp32fc *)(&y[0]), 32768);

// your "tic" timer here
ippsMul_32fc((Ipp32fc *)(&x1[0]), (Ipp32fc *)(&x2[0]), (Ipp32fc *)(&y[0]), 32768);
// your "toc" timer here

//
// Now repeat the above for the MKL vector multiply routine
//
vcMul(32768,x1,x2,y);

// your "tic" timer here
vcMul(32768,x1,x2,y);
// your "toc" timer here

=======================================================================

The above code in put in a main() routine. The compilation takes the form:

icpc test.cpp -O3 -L/opt/intel/ipp/6.1.2.051/ia32/sharedlib -lipps -lippcore ...

-Simon
0 Kudos
Ilya_B_Intel
Employee
717 Views
Simon,

MKL function vcMul() gives more accurate result by default than ippsMul_32fc().

In order to enable less accurate but faster function in MKL you can call vmlSetMode(VML_EP) before the call to vcMul().

In order to use more accurate functions in IPP you can check Fixed-Accuracy Arithmetic Functions domain. ippsMul_32fc_A24() will be as accurate as vcMul() in VML_HA (default) mode.

- Ilya

0 Kudos
sgwood
Beginner
717 Views
Ilya,

Excellent! Thanks for the explanation. I over looked that difference.

-Simon
0 Kudos
Reply