- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have for evaluation purposes downloaded the vector math libray, where I am particularly interested in exp and log for vectors of length about 20. From the information I found on the Intel website I had expected considerable speed-up but achieved only marginal results compared to the conventional scalar compiler library functions (Intel Fortran v. 9.0) when the compiler was using SSE2 instructions (option -QxW). Can this be correct, or am I missing something?
I do get a significant speedup compared to compilation to default P4.
Michael
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By default P4, do you mean compiling for processors early than P4?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ifort xxx.for yyy.lib
wherease the fast version is
ifort xxx.for yyy.lib -QxW
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have for evaluation purposes downloaded the vector math libray, where I am particularly interested in exp and log for vectors of length about 20.
I dont think that VML will bring you a real advantage on vector lengths about 20. To have a real advantage you should think about code/loop modifications so that VML functions are called on the lengths ~100 or even larger. (This is typically done by buffering).
From the information I found on the Intel website I had expected considerable speed-up
Perhaps the information you are talking about is a peak VML performance compared with conventional scalar math library. The peak performance is achieved on sufficiently large vectors. For more information I encourage you to look at http://www.intel.com/software/products/mkl/data/vml/functions/_listfunc.html
(click on the function of interest and see the graphs of the dependence of VML performance on vector length).
but achieved only marginal results compared to the conventional scalar compiler library functions
I believe that marginal improvement is because the compiler was able to vectorize your loop. As soon as the compiler vectorizes the loop with a math function, it calls internal vectorized math library SVML (VML-like) rather than conventional scalar compiler library. The vectorizer (and SVML in particular) gives substantial speedup compared with conventional (scalar) loop. In particular, the vectorizer was invoked when you compiled with /QxW switch.
Having that in mind, I can comment on SVML and VML differences. Due to different design requirements SVML and VML performance may be comparable on moderately small vector lengths (loop counts). The peak VML performance is clearly better than peak SVML performance (again due to design requirements). For example, the high accuracy single precision VML logarithm takes 15.7 cycles per result whereas SVML logarithm works 17 cycles. For the reference, VML low accuracy log takes 12.5 cycles. VML low accuracy functions are comparable in accuracy with SVML functions (the design requirement is 4 ulp, or roughly 2 incorrect least significant bits).Thus a comparison 17 vs. 12.5 is fairer. By default high accuracy flavor is set in VML. To change the accuracy flavor you should call a special service routine. For details I refer you to the MKL Reference Manual http://www.intel.com/software/products/mkl/docs/mklman.htm.
So, to summarize, yes in your particular case VML performance can be comparable with vectorized loop performance. You need to decide whether to modify your code so that VML calls lead to the quasi-peak performance or continue to use SVML.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
in my case is 10 to 30, and I am working on optimizing an application where around 20% of the time is spent on log or exp. Cost of MKL is not an issue, but the time expenditure to change the code is!
What I would like to know more about is where vectorization
will be possible. In other words, will any exp or log be vectorized (in case the compilation options are set), or do I have to move these functions to a separate, short loop like
do i = 1,n
b(i) = exp(a(i))
enddo
in order to obtain the desired result?
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Default vectorization options usually take the loop 8 iterations at a time, with scalar remainder loops to make up the difference at one or both ends. Adding the -O1 flag cuts unrolling back to the minimum consistent with vectorization, which may prove better for the loop lengths you mention. Also, with fairly short loops, it is important to help the compiler recognize when the data are aligned (on 16-byte boundaries), to avoid run-time alignment checks and adjustments.
The help you get from compiler diagnostics about effectiveness of vectorization is minimal. If you got a LOOP VECTORIZED report for a given loop, that assures you that vectorized code or short vector library calls have been generated for everything in that loop, including math functions. A PARTIAL LOOP VECTORIZED report indicates that the loop has been distributed, with at least one portion vectorized.
If your information about the time spent in math functions comes from profiling, repeating the profiling with a vectorized build would show how much was gained by shifting work from scalar to short vector functions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A few more cents If you are interested to increase your expertise in IA-32/EM64T Intel compilers (and in vectorizer in particular) as well as be more familiar with IA-32/EM64T optimizations then I would recommend you to read the book by Aart Bik The Software Vectorization Handbook http://www.intel.com/intelpress/sum_vmmx.htm. Aart is an author/ideologist of Intel C/Fortran compiler vectorizer. Be sure that reading this book you will get the information from first hands.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page