transcendental speed

mrentropy1 · ‎08-21-2009

This might not belong in MKL forum, but I'm not sure where else to put it - sorry.

Anybody know how transcendental function evaluation on Intel 64 compares with floating-point divide, for 64-bit float? I have a transformation I could write with sines and cosines, or I could do it differently and use a simple divide - but doing it that way requries a lot more work on my part.... This is in a numerically-intensive code in what I think may be a significant bottleneck, so faster is better. This will be parallel operations on a very large array.

Thanks,
Peter

P.S. Based on my own timing I get mult : divide : sqrt() : sin() speed ratio of
1 : 1.7 : 2.3 : 5.6
for 32- bit and
1 : 2.8 : 3.3 : 6.7
for 64 bit, on a Core2Duo, but I'm not sure if/when that translates into raw clock cycle ratios...., and how other factors might have affected my measurement. That's using Intel Fortran with no compiler flags.

TimP · ‎08-21-2009

ifort defaults to enabling auto-vectorization, with calls to svml (short vector) math library. If you have vectorizable loops several thousand elements long, the VML library in MKL might do better. You could look up quoted performance for VML. Anyway, the numbers you quote look reasonable as a rough guide for scalar code.

mrentropy1 · ‎08-21-2009

Great. Thanks very much!!!!

Shane_S_Intel · ‎08-21-2009

The following page (http://www.intel.com/software/products/mkl/data/vml/functions/_performanceall.htm) gives the cycle counts (per element)for the Intel MKL vector math library functions. A quick review of it will likely provide you the insights you need on the best way to code your algorithm. Regards, Shane