- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am a newbie so please bear with me if I provide irrelevant details.
I am trying to achieve the speeds reported in:
http://software.intel.com/sites/products/documentation/hpc/mkl/vml/functions/_performanceall.html
for the log function vsLn()
My simple C script containsjust one call to vsLn()
I compile it on windows using:
g++ -I"C:/PROGRA~1/R/R-212~1.1/include" -I"C:/Progra~1/Intel/ComposerXE-2011/mkl
/include" -O2 -Wall -c MKLvml_main.cc -o MKLvml_main.o
g++ -shared -s -static-libgcc -o MKLvml.dll tmp.def MKLvml_main.o C:/Progra~1/In
tel/ComposerXE-2011/mkl/lib/ia32/mkl_intel_c_dll.lib C:/Progra~1/Intel/ComposerX
E-2011/mkl/lib/ia32/mkl_sequential_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl
/lib/ia32/mkl_core_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl/lib/ia32/mkl_rt
.lib -LC:/PROGRA~1/R/R-212~1.1/bin/i386 -lR
As you can see I am using sequential library. I also tried parallel and the results are the same.
Can someone please suggest what I can do to improve the speed?
Currently 10^8 log operations (in a loop of 10^3 iterations each computing the log of a 10^5 long vector) takes around 6s. Expected is less than .5s.
( The results I am getting are just 2x improvement over the default log calculation. I am working inside R just FYI.)
Thanks.
I am a newbie so please bear with me if I provide irrelevant details.
I am trying to achieve the speeds reported in:
http://software.intel.com/sites/products/documentation/hpc/mkl/vml/functions/_performanceall.html
for the log function vsLn()
My simple C script containsjust one call to vsLn()
I compile it on windows using:
g++ -I"C:/PROGRA~1/R/R-212~1.1/include" -I"C:/Progra~1/Intel/ComposerXE-2011/mkl
/include" -O2 -Wall -c MKLvml_main.cc -o MKLvml_main.o
g++ -shared -s -static-libgcc -o MKLvml.dll tmp.def MKLvml_main.o C:/Progra~1/In
tel/ComposerXE-2011/mkl/lib/ia32/mkl_intel_c_dll.lib C:/Progra~1/Intel/ComposerX
E-2011/mkl/lib/ia32/mkl_sequential_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl
/lib/ia32/mkl_core_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl/lib/ia32/mkl_rt
.lib -LC:/PROGRA~1/R/R-212~1.1/bin/i386 -lR
As you can see I am using sequential library. I also tried parallel and the results are the same.
Can someone please suggest what I can do to improve the speed?
Currently 10^8 log operations (in a loop of 10^3 iterations each computing the log of a 10^5 long vector) takes around 6s. Expected is less than .5s.
( The results I am getting are just 2x improvement over the default log calculation. I am working inside R just FYI.)
Thanks.
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We will try to reproduce your situation. But I can say right now, that you measured not vsLn performance only. You measured overheads for MKL dlls loading, call to vmlSetMode and maybe some other overheads were included to your measurements.
Andrey
Andrey
Link Copied
12 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi sdgkgp!
could you provide little bit more details?
1) your sample program will be helpful for us
2) at with CPU you are running your sample?
Andrey
could you provide little bit more details?
1) your sample program will be helpful for us
2) at with CPU you are running your sample?
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We also need to know the exact version of mkl you are using. Could please let us know the Package ID?
You can find it in the mklsupport.txt file ( \Documentation\ )
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
1) My C code is as follows:
#include
#include "R.h"
#include "Rmath.h"
#include "mkl_vml.h"
#include "mkl.h"
extern "C" {
void get_mkl_log(float *fB, int *Blen, float *fA, int *Alen){
vmlSetMode(VML_EP);
MKL_INT vec_len = Alen[0];
vsLn(vec_len, fA, fB);
return;
}
}
As you can see there are some R header files which are for enabling R to talk with C++
2) I am using Intel Core 2 Quad CPU Q9400 @ 2.66GHz
Please let me know what more details I can provide.
1) My C code is as follows:
#include
#include "R.h"
#include "Rmath.h"
#include "mkl_vml.h"
#include "mkl.h"
extern "C" {
void get_mkl_log(float *fB, int *Blen, float *fA, int *Alen){
vmlSetMode(VML_EP);
MKL_INT vec_len = Alen[0];
vsLn(vec_len, fA, fB);
return;
}
}
As you can see there are some R header files which are for enabling R to talk with C++
2) I am using Intel Core 2 Quad CPU Q9400 @ 2.66GHz
Please let me know what more details I can provide.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Gennady,
It is
Package ID: w_mkl_10.3.2.154 w_ccompxe_2011.2.154 w_fcompxe_2011.2.154
Thanks again for looking into this. Looking forward to your reply.
It is
Package ID: w_mkl_10.3.2.154 w_ccompxe_2011.2.154 w_fcompxe_2011.2.154
Thanks again for looking into this. Looking forward to your reply.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sdgkgp,
could you provide full example? It will help us to give exact and quick answer. We also need to know how you fill input vector, how you are doing performance measurements and etc.
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I mentioned, this is done inside R:
dyn.load("C:/RPackages/MKLvml/src/MKLvml.dll")
N = 1e3
in_vec = as.single( runif(N) ) # generates random uniform numbers between 0 and 1
out_vec = as.single( vector("numeric",N) ) # allocated mempry to out_vec
system.time( # for performance measurement (time taken)
for (i in 1:1e5)
{
t <- .C("get_mkl_log", dB = out_vec, Blen = as.integer(N), dA = in_vec, Alen = as.integer(N) ) # actual call
}
)
The output I get is:
user system elapsed
4.53 0.00 4.54
which means 4.54s were taken by the core process.
dyn.load("C:/RPackages/MKLvml/src/MKLvml.dll")
N = 1e3
in_vec = as.single( runif(N) ) # generates random uniform numbers between 0 and 1
out_vec = as.single( vector("numeric",N) ) # allocated mempry to out_vec
system.time( # for performance measurement (time taken)
for (i in 1:1e5)
{
t <- .C("get_mkl_log", dB = out_vec, Blen = as.integer(N), dA = in_vec, Alen = as.integer(N) ) # actual call
}
)
The output I get is:
user system elapsed
4.53 0.00 4.54
which means 4.54s were taken by the core process.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The example above shows that the computation is being done at:
1e8 * 3.01 / 4.54 = .066 Ghz
while my CPU is 2.66 Ghz
( 1e8 log operations each consuming 3.01 cycles as given in the performance docs for vsLn in EP mode )
while my CPU is 2.66 Ghz
( 1e8 log operations each consuming 3.01 cycles as given in the performance docs for vsLn in EP mode )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We will try to reproduce your situation. But I can say right now, that you measured not vsLn performance only. You measured overheads for MKL dlls loading, call to vmlSetMode and maybe some other overheads were included to your measurements.
Andrey
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Your linking line:
g++ -shared -s -static-libgcc -o MKLvml.dll tmp.def MKLvml_main.o C:/Progra~1/In
tel/ComposerXE-2011/mkl/lib/ia32/mkl_intel_c_dll.lib C:/Progra~1/Intel/ComposerX
E-2011/mkl/lib/ia32/mkl_sequential_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl
/lib/ia32/mkl_core_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl/lib/ia32/mkl_rt
.lib -LC:/PROGRA~1/R/R-212~1.1/bin/i386 -lR
used sequential library together with mkl_rt :( You'd beter use one linking model.
Please try MKL Link Line Advisor
But for eliminating overhead on loading dynamic libraies please use only static libraies if possible.
Your linking line:
g++ -shared -s -static-libgcc -o MKLvml.dll tmp.def MKLvml_main.o C:/Progra~1/In
tel/ComposerXE-2011/mkl/lib/ia32/mkl_intel_c_dll.lib C:/Progra~1/Intel/ComposerX
E-2011/mkl/lib/ia32/mkl_sequential_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl
/lib/ia32/mkl_core_dll.lib C:/Progra~1/Intel/ComposerXE-2011/mkl/lib/ia32/mkl_rt
.lib -LC:/PROGRA~1/R/R-212~1.1/bin/i386 -lR
used sequential library together with mkl_rt :( You'd beter use one linking model.
Please try MKL Link Line Advisor
But for eliminating overhead on loading dynamic libraies please use only static libraies if possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you everyone for your answers.
I was making mistake in performance evaluation.
It turns out R has a lot of overhead when communicating data to C and that is why it is so slow.
When I compute the timing from inside C, the numbers match with those reported in the performance docs.
I was making mistake in performance evaluation.
It turns out R has a lot of overhead when communicating data to C and that is why it is so slow.
When I compute the timing from inside C, the numbers match with those reported in the performance docs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi sdgkgp,
It still makes sense to understand why R calling overhead of the third-party DLL is so big. We will experiment on our side and report back. If you also havesome interestingfindings on your side, we will be happy if you let us know about those.
Many thanks for your interest,
Sergey
It still makes sense to understand why R calling overhead of the third-party DLL is so big. We will experiment on our side and report back. If you also havesome interestingfindings on your side, we will be happy if you let us know about those.
Many thanks for your interest,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sdgkgp,
You seem to use .C function in your R application:
t <- .C("get_mkl_log", dB = out_vec, Blen = as.integer(N), dA = in_vec, Alen = as.integer(N) ) # actual call
According to Section 5.2 of the document "Writing R extensions" available at http://cran.r-project.org/doc/manuals/R-exts.html.C function can introduce an additional argument overhead: "Unless formal argument NAOK is true, all the other arguments are checked for missing values NA and for the IEEE special values NaN, Inf and -Inf, and the presence of any of these generates an error."
You might want to try .External or .Call functions as alternative. Hope this would help.
Thanks,
Andrey
You seem to use .C function in your R application:
t <- .C("get_mkl_log", dB = out_vec, Blen = as.integer(N), dA = in_vec, Alen = as.integer(N) ) # actual call
According to Section 5.2 of the document "Writing R extensions" available at http://cran.r-project.org/doc/manuals/R-exts.html.C function can introduce an additional argument overhead: "Unless formal argument NAOK is true, all the other arguments are checked for missing values NA and for the IEEE special values NaN, Inf and -Inf, and the presence of any of these generates an error."
You might want to try .External or .Call functions as alternative. Hope this would help.
Thanks,
Andrey
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page