topic Re: transcendental speed in Intel® oneAPI Math Kernel Library

transcendental speed

mrentropy1 — Fri, 21 Aug 2009 19:59:06 GMT

This might not belong in MKL forum, but I'm not sure where else to put it - sorry.

Anybody know how transcendental function evaluation on Intel 64 compares with floating-point divide, for 64-bit float? I have a transformation I could write with sines and cosines, or I could do it differently and use a simple divide - but doing it that way requries a lot more work on my part.... This is in a numerically-intensive code in what I think may be a significant bottleneck, so faster is better. This will be parallel operations on a very large array.

Thanks,
Peter

P.S. Based on my own timing I get mult : divide : sqrt() : sin() speed ratio of
1 : 1.7 : 2.3 : 5.6
for 32- bit and
1 : 2.8 : 3.3 : 6.7
for 64 bit, on a Core2Duo, but I'm not sure if/when that translates into raw clock cycle ratios...., and how other factors might have affected my measurement. That's using Intel Fortran with no compiler flags.

Re: transcendental speed

TimP — Fri, 21 Aug 2009 21:03:24 GMT

ifort defaults to enabling auto-vectorization, with calls to svml (short vector) math library. If you have vectorizable loops several thousand elements long, the VML library in MKL might do better. You could look up quoted performance for VML. Anyway, the numbers you quote look reasonable as a rough guide for scalar code.

Re: transcendental speed

mrentropy1 — Fri, 21 Aug 2009 21:31:58 GMT

Great. Thanks very much!!!!

Re: transcendental speed

Shane_S_Intel — Fri, 21 Aug 2009 21:48:26 GMT

The following page (http://www.intel.com/software/products/mkl/data/vml/functions/_performanceall.htm) gives the cycle counts (per element)for the Intel MKL vector math library functions. A quick review of it will likely provide you the insights you need on the best way to code your algorithm. Regards, Shane