Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Intel VML slow

sdgkgp
Beginner
1,068 Views
Hi,

I am a newbie so please bear with me if I provide irrelevant details.

I am trying to achieve the speeds reported in:
http://software.intel.com/sites/products/documentation/hpc/mkl/vml/functions/_performanceall.html

for the log function vsLn()

My simple C script containsjust one call to vsLn()

I compile it on windows using:

g++ -I"C:/PROGRA~1/R/R-212~1.1/include" -I"C:/Progra~1/Intel/ComposerXE-2011/mkl
/include" -O2 -Wall -c MKLvml_main.cc -o MKLvml_main.o

g++ -shared -s -static-libgcc -o MKLvml.dll tmp.def MKLvml_main.o C:/Progra~1/In
tel/ComposerXE-2011/mkl/lib/ia32/mkl_intel_c_dll.lib C:/Progra~1/Intel/ComposerX
E-2011/mkl/lib/ia32/mkl_sequential_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl
/lib/ia32/mkl_core_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl/lib/ia32/mkl_rt
.lib -LC:/PROGRA~1/R/R-212~1.1/bin/i386 -lR


As you can see I am using sequential library. I also tried parallel and the results are the same.


Can someone please suggest what I can do to improve the speed?

Currently 10^8 log operations (in a loop of 10^3 iterations each computing the log of a 10^5 long vector) takes around 6s. Expected is less than .5s.

( The results I am getting are just 2x improvement over the default log calculation. I am working inside R just FYI.)


Thanks.

0 Kudos
1 Solution
Andrey_G_Intel2
Employee
1,068 Views
We will try to reproduce your situation. But I can say right now, that you measured not vsLn performance only. You measured overheads for MKL dlls loading, call to vmlSetMode and maybe some other overheads were included to your measurements.

Andrey

View solution in original post

0 Kudos
12 Replies
Andrey_G_Intel2
Employee
1,068 Views
Hi sdgkgp!

could you provide little bit more details?
1) your sample program will be helpful for us
2) at with CPU you are running your sample?

Andrey
0 Kudos
Gennady_F_Intel
Moderator
1,068 Views
We also need to know the exact version of mkl you are using. Could please let us know the Package ID?
You can find it in the mklsupport.txt file ( \Documentation\ )
--Gennady
0 Kudos
sdgkgp
Beginner
1,068 Views
Hi Andrey,

1) My C code is as follows:

#include
#include "R.h"
#include "Rmath.h"
#include "mkl_vml.h"
#include "mkl.h"

extern "C" {

void get_mkl_log(float *fB, int *Blen, float *fA, int *Alen){

vmlSetMode(VML_EP);
MKL_INT vec_len = Alen[0];
vsLn(vec_len, fA, fB);

return;
}

}


As you can see there are some R header files which are for enabling R to talk with C++

2) I am using Intel Core 2 Quad CPU Q9400 @ 2.66GHz


Please let me know what more details I can provide.
0 Kudos
sdgkgp
Beginner
1,068 Views
Hello Gennady,

It is

Package ID: w_mkl_10.3.2.154 w_ccompxe_2011.2.154 w_fcompxe_2011.2.154

Thanks again for looking into this. Looking forward to your reply.
0 Kudos
Andrey_G_Intel2
Employee
1,068 Views

sdgkgp,


could you provide full example? It will help us to give exact and quick answer. We also need to know how you fill input vector, how you are doing performance measurements and etc.

Andrey

0 Kudos
sdgkgp
Beginner
1,068 Views
As I mentioned, this is done inside R:


dyn.load("C:/RPackages/MKLvml/src/MKLvml.dll")
N = 1e3
in_vec = as.single( runif(N) ) # generates random uniform numbers between 0 and 1
out_vec = as.single( vector("numeric",N) ) # allocated mempry to out_vec

system.time( # for performance measurement (time taken)
for (i in 1:1e5)
{
t <- .C("get_mkl_log", dB = out_vec, Blen = as.integer(N), dA = in_vec, Alen = as.integer(N) ) # actual call
}
)

The output I get is:

user system elapsed
4.53 0.00 4.54

which means 4.54s were taken by the core process.
0 Kudos
sdgkgp
Beginner
1,068 Views
The example above shows that the computation is being done at:

1e8 * 3.01 / 4.54 = .066 Ghz

while my CPU is 2.66 Ghz

( 1e8 log operations each consuming 3.01 cycles as given in the performance docs for vsLn in EP mode )
0 Kudos
Andrey_G_Intel2
Employee
1,069 Views
We will try to reproduce your situation. But I can say right now, that you measured not vsLn performance only. You measured overheads for MKL dlls loading, call to vmlSetMode and maybe some other overheads were included to your measurements.

Andrey
0 Kudos
barragan_villanueva_
Valued Contributor I
1,068 Views
Hi,

Your linking line:

g++ -shared -s -static-libgcc -o MKLvml.dll tmp.def MKLvml_main.o C:/Progra~1/In
tel/ComposerXE-2011/mkl/lib/ia32/mkl_intel_c_dll.lib C:/Progra~1/Intel/ComposerX
E-2011/mkl/lib/ia32/mkl_sequential_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl
/lib/ia32/mkl_core_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl/lib/ia32/mkl_rt
.lib -LC:/PROGRA~1/R/R-212~1.1/bin/i386 -lR

used sequential library together with mkl_rt :( You'd beter use one linking model.
Please try MKL Link Line Advisor

But for eliminating overhead on loading dynamic libraies please use only static libraies if possible.
0 Kudos
sdgkgp
Beginner
1,068 Views
Thank you everyone for your answers.

I was making mistake in performance evaluation.

It turns out R has a lot of overhead when communicating data to C and that is why it is so slow.

When I compute the timing from inside C, the numbers match with those reported in the performance docs.
0 Kudos
Sergey_M_Intel2
Employee
1,068 Views
Hi sdgkgp,

It still makes sense to understand why R calling overhead of the third-party DLL is so big. We will experiment on our side and report back. If you also havesome interestingfindings on your side, we will be happy if you let us know about those.

Many thanks for your interest,
Sergey
0 Kudos
Andrey_N_Intel
Employee
1,068 Views
Hello Sdgkgp,

You seem to use .C function in your R application:
t <- .C("get_mkl_log", dB = out_vec, Blen = as.integer(N), dA = in_vec, Alen = as.integer(N) ) # actual call

According to Section 5.2 of the document "Writing R extensions" available at http://cran.r-project.org/doc/manuals/R-exts.html.C function can introduce an additional argument overhead: "Unless formal argument NAOK is true, all the other arguments are checked for missing values NA and for the IEEE special values NaN, Inf and -Inf, and the presence of any of these generates an error."

You might want to try .External or .Call functions as alternative. Hope this would help.

Thanks,
Andrey

0 Kudos
Reply