- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This might not belong in MKL forum, but I'm not sure where else to put it - sorry.
Anybody know how transcendental function evaluation on Intel 64 compares with floating-point divide, for 64-bit float? I have a transformation I could write with sines and cosines, or I could do it differently and use a simple divide - but doing it that way requries a lot more work on my part.... This is in a numerically-intensive code in what I think may be a significant bottleneck, so faster is better. This will be parallel operations on a very large array.
Thanks,
Peter
P.S. Based on my own timing I get mult : divide : sqrt() : sin() speed ratio of
1 : 1.7 : 2.3 : 5.6
for 32- bit and
1 : 2.8 : 3.3 : 6.7
for 64 bit, on a Core2Duo, but I'm not sure if/when that translates into raw clock cycle ratios...., and how other factors might have affected my measurement. That's using Intel Fortran with no compiler flags.
Anybody know how transcendental function evaluation on Intel 64 compares with floating-point divide, for 64-bit float? I have a transformation I could write with sines and cosines, or I could do it differently and use a simple divide - but doing it that way requries a lot more work on my part.... This is in a numerically-intensive code in what I think may be a significant bottleneck, so faster is better. This will be parallel operations on a very large array.
Thanks,
Peter
P.S. Based on my own timing I get mult : divide : sqrt() : sin() speed ratio of
1 : 1.7 : 2.3 : 5.6
for 32- bit and
1 : 2.8 : 3.3 : 6.7
for 64 bit, on a Core2Duo, but I'm not sure if/when that translates into raw clock cycle ratios...., and how other factors might have affected my measurement. That's using Intel Fortran with no compiler flags.
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ifort defaults to enabling auto-vectorization, with calls to svml (short vector) math library. If you have vectorizable loops several thousand elements long, the VML library in MKL might do better. You could look up quoted performance for VML. Anyway, the numbers you quote look reasonable as a rough guide for scalar code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great. Thanks very much!!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following page (http://www.intel.com/software/products/mkl/data/vml/functions/_performanceall.htm) gives the cycle counts (per element)for the Intel MKL vector math library functions. A quick review of it will likely provide you the insights you need on the best way to code your algorithm. Regards, Shane

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page