- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I am a newbie so please bear with me if I provide irrelevant details.

I am trying to achieve the speeds reported in:

http://software.intel.com/sites/products/documentation/hpc/mkl/vml/functions/_performanceall.html

for the log function vsLn()

My simple C script containsjust one call to vsLn()

I compile it on windows using:

g++ -I"C:/PROGRA~1/R/R-212~1.1/include" -I"C:/Progra~1/Intel/ComposerXE-2011/mkl

/include" -O2 -Wall -c MKLvml_main.cc -o MKLvml_main.o

g++ -shared -s -static-libgcc -o MKLvml.dll tmp.def MKLvml_main.o C:/Progra~1/In

tel/ComposerXE-2011/mkl/lib/ia32/mkl_intel_c_dll.lib C:/Progra~1/Intel/ComposerX

E-2011/mkl/lib/ia32/mkl_sequential_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl

/lib/ia32/mkl_core_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl/lib/ia32/mkl_rt

.lib -LC:/PROGRA~1/R/R-212~1.1/bin/i386 -lR

As you can see I am using sequential library. I also tried parallel and the results are the same.

Can someone please suggest what I can do to improve the speed?

Currently 10^8 log operations (in a loop of 10^3 iterations each computing the log of a 10^5 long vector) takes around 6s. Expected is less than .5s.

( The results I am getting are just 2x improvement over the default log calculation. I am working inside R just FYI.)

Thanks.

1 Solution

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Andrey

Link Copied

12 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

could you provide little bit more details?

1) your sample program will be helpful for us

2) at with CPU you are running your sample?

Andrey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

You can find it in the mklsupport.txt file ( \Documentation\ )

--Gennady

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

1) My C code is as follows:

#include

#include "R.h"

#include "Rmath.h"

#include "mkl_vml.h"

#include "mkl.h"

extern "C" {

void get_mkl_log(float *fB, int *Blen, float *fA, int *Alen){

vmlSetMode(VML_EP);

MKL_INT vec_len = Alen[0];

vsLn(vec_len, fA, fB);

return;

}

}

As you can see there are some R header files which are for enabling R to talk with C++

2) I am using Intel Core 2 Quad CPU Q9400 @ 2.66GHz

Please let me know what more details I can provide.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

It is

Package ID: w_mkl_10.3.2.154 w_ccompxe_2011.2.154 w_fcompxe_2011.2.154

Thanks again for looking into this. Looking forward to your reply.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

sdgkgp,

could you provide full example? It will help us to give exact and quick answer. We also need to know how you fill input vector, how you are doing performance measurements and etc.

Andrey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

dyn.load("C:/RPackages/MKLvml/src/MKLvml.dll")

N = 1e3

in_vec = as.single( runif(N) ) # generates random uniform numbers between 0 and 1

out_vec = as.single( vector("numeric",N) ) # allocated mempry to out_vec

system.time( # for performance measurement (time taken)

for (i in 1:1e5)

{

t <- .C("get_mkl_log", dB = out_vec, Blen = as.integer(N), dA = in_vec, Alen = as.integer(N) ) # actual call

}

)

The output I get is:

user system elapsed

4.53 0.00 4.54

which means 4.54s were taken by the core process.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

while my CPU is 2.66 Ghz

( 1e8 log operations each consuming 3.01 cycles as given in the performance docs for vsLn in EP mode )

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Andrey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Your linking line:

g++ -shared -s -static-libgcc -o MKLvml.dll tmp.def MKLvml_main.o C:/Progra~1/In

tel/ComposerXE-2011/mkl/lib/ia32/mkl_intel_c_dll.lib C:/Progra~1/Intel/ComposerX

E-2011/mkl/lib/ia32/

**mkl_sequential_dll.lib**C:/Progra~1/Intel/ComposerXE-2011/mkl

/lib/ia32/mkl_core_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl/lib/ia32/

**mkl_rt**

.lib -LC:/PROGRA~1/R/R-212~1.1/bin/i386 -lR

used sequential library together with mkl_rt :( You'd beter use one linking model.

Please try MKL Link Line Advisor

But for eliminating overhead on loading dynamic libraies please use only static libraies if possible.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I was making mistake in performance evaluation.

It turns out R has a lot of overhead when communicating data to C and that is why it is so slow.

When I compute the timing from inside C, the numbers match with those reported in the performance docs.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

It still makes sense to understand why R calling overhead of the third-party DLL is so big. We will experiment on our side and report back. If you also havesome interestingfindings on your side, we will be happy if you let us know about those.

Many thanks for your interest,

Sergey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

You seem to use .C function in your R application:

t <- .C("get_mkl_log", dB = out_vec, Blen = as.integer(N), dA = in_vec, Alen = as.integer(N) ) # actual call

According to Section 5.2 of the document "Writing R extensions" available at http://cran.r-project.org/doc/manuals/R-exts.html.C function can introduce an additional argument overhead: "Unless formal argument NAOK is true, all the other arguments are checked for missing values NA and for the IEEE special values NaN, Inf and -Inf, and the presence of any of these generates an error."

You might want to try .External or .Call functions as alternative. Hope this would help.

Thanks,

Andrey

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page