Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Terrible 64bit performance on VML 8.

spec
Beginner
379 Views
I benchmarked vsSqrt and vsInvSqrt against SSE loops that use sqrtps and sqrtps/divps. Each routine is passed a 2^20 element array, and is run 256 times. My platform is a RevE (SSE3) Athlon64 on XP64. The compiler is msvc8 beta2. The SSE code runs the same speed on 32 and 64bit, but the VML code is much slower on 64bit. 64bt vsInvSqrt is 4x slower than its 32bit counterpart, and vsSqrt is almost 8x slower than its 32bit counterpart. The errors are the same between 32 and 64bit, so I'm guessing that the libraries are implementating the same algorithm. In both cases I am linking against the static, not the dll, libraries. The timings listed here:
http://www.intel.com/software/products/mkl/data/vml/functions/_performanceall.html
don't show any signficant difference between the 32bit and 64bit libs. Anyone else witnessed these sorts of timings on 64bit?

32bit INVSQRT (sqrtps + divps)
SSE time: 2.801254s.
Standard deviation: 1.040377e-013.
VML time: 1.329339s.
Standard deviation: 7.497016e-014.

32bit SQRT (sqrtps)
SSE time: 1.513263s.
Standard deviation: 8.583913e-009.
VML time: 1.847221s.
Standard deviation: 9.348521e-009.


64bit INVSQRT (sqrtps + divps)
SSE time: 2.809789s.
Standard deviation: 1.050167e-013.
VML time: 5.318174s.
Standard deviation: 7.547861e-014.

64bit SQRT (sqrtps)
SSE time: 1.531398s.
Standard deviation: 8.583939e-009.
VML time: 14.243044s.
Standard deviation: 9.348549e-009.
0 Kudos
0 Replies
Reply