- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Recently, I ran a simple tests to benchmark double precision transcendental functions performance on both Xeon Phi and Kepler GPU. Surprisingly, the GPU was about 20x faster and it looks to me that the vectorized double precision calculations, on both CPU and XPhi, are done in software. So, I'm just wondering if there is any way to accelerate the performance.
-Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As we don't know what is simple to you, several things to check:
Did you use icc rather than gcc?
Does -vec-report show vectorization? Successful vectorization report is prerequisite but not sufficient.
If you set -fp-model source, did you also set -ftz -fast-transcendentals?
If using complex, did you set -complex-limited-range?
Did you set 64-byte data alignment?
Are the -imf- options for reduced accuracy applicable?
There is firmware support only for float math transcendentals.
Neither Intel(r) Xeon Phi(tm) nor Kepler GPU are intended to be competitive without threading, but I suppose you could compare single threaded performance. "simple" seems possibly to exclude realistic multi-thread comparison.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply. I'm using icpc compiler and it successfully vectorized everything.
For fp-model I used the default settings, since the precision is important to me, also, I'm using double and my alignment is 64.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page