using (Intel) AVX within vmdexp

parallelworker · ‎07-15-2012

Hey Intel folks,

anyone out there who tested/used the vmdexp function in fortran code with Intel AVX extensions enabled at
compiling time via the command line switch -mavx ???

I just implemted the vmdexp function in a library function of a fortran combustion simulation code, but unfortunately
the wall time spend in the function (measured with omp_get_wtime) raised by a factor of 3 to 3.5 running (compared to
code relying completely on calls to the exp function from the Intel compiler math lib ) on a sandy bridge machine with
a processor implementing the Intel AVX extensions. :-(

Did anyone encounter the same/similar problem(s) using the vmdexp function in a C++/Fortran program?

I do not think that misalignment of the elements of the hand-written vector with 156 double precision elements should
cause such a massive performance loss when using the vmdexp function.
But I've just to test for this in near future (being just a tedious task ;-} )...

I also experimented with a vector generated by a do-loop, but the problem get even worse... :-(

@ Intel programmes: Did you implement any support for AVX in vmdexp or does the function completely lack from
support for (Intel) AVX ???

Thanks in advance for your replies and help, Sebastian.

Nikita_A_Intel · ‎07-16-2012

Hi Sebastian,
thanks for your report. Short answer to your questions is yes, Intel MKL vmdexp() function supports Intel AVX.

However in order for us to be able to give you a complete and accurate answer we'll need some important details.
1) What kind of OS is that? What archtecture 32-bit or 64?
2) What are the versions of Intel MKL and Intel Compiler that you used?
3) What are your arguments to the exp? Do they have a potential to cause over or underflow? E.g. good arguments are within approximately [-707, 707] interval for double precision.
4) What compiler switches do you use to compile your Fortran program? Does the call to exp function getvectorized?

Assuming that you have latest software and your arguments are all OK, I would point to an important moment: Intel MKL VML functions are intended to work best if vector lengths are somewhat larger, e.g. something like 500-10000 elements. E.g. see the charts http://software.intel.com/sites/products/documentation/hpc/mkl/vml/functions/exp.html

Anyway a small code sample would help us to investigate your case.

Thanks,
Nikita