Transcendental functions issue

Vahid_N_ · ‎07-31-2013

Hi All,

Recently, I ran a simple tests to benchmark double precision transcendental functions performance on both Xeon Phi and Kepler GPU. Surprisingly, the GPU was about 20x faster and it looks to me that the vectorized double precision calculations, on both CPU and XPhi, are done in software. So, I'm just wondering if there is any way to accelerate the performance.

-Thanks

TimP · ‎07-31-2013

As we don't know what is simple to you, several things to check:

Did you use icc rather than gcc?

Does -vec-report show vectorization? Successful vectorization report is prerequisite but not sufficient.

If you set -fp-model source, did you also set -ftz -fast-transcendentals?

If using complex, did you set -complex-limited-range?

Did you set 64-byte data alignment?

Are the -imf- options for reduced accuracy applicable?

There is firmware support only for float math transcendentals.

Neither Intel(r) Xeon Phi(tm) nor Kepler GPU are intended to be competitive without threading, but I suppose you could compare single threaded performance. "simple" seems possibly to exclude realistic multi-thread comparison.

Vahid_N_ · ‎07-31-2013

Thanks for the reply. I'm using icpc compiler and it successfully vectorized everything.

For fp-model I used the default settings, since the precision is important to me, also, I'm using double and my alignment is 64.