Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
6697 Discussions

performance difference in 32 bit mode and 64 bit mode?


Someonetold me he ovserved a 30% DGEMMperformance difference with MKL on the SAME core 2 duomachine with 32-bit and 64-bit OSes installed. I am not sure if he used the latest version of MKL.I am really curious to know if it is true since we are buying some 64-bit mahcines and we need tochoose between64-bitand 32-bit Linux. We don't plan to get more than 3G memory.My understanding is the difference hereis mainly those additionalxmm and general pupose registers. But can they really help in a computation-intensive kernelsuch asDGEMM which already gave close to peak performance on IA32 machines?How aboutother kernels such as FFT and level-2 blas?Let's assume large array sizes.

I only have access to P4 and Itanium 2 right now so I can' conduct any experiments. Anyone has such an experience?

0 Kudos
2 Replies
Black Belt
I agree that such a large difference is unlikely to be due directly to the OS. Possibly, there might be such a difference between the 32-bit MKL and the 64-bit MKL which is optimized for Core 2 Duo (8.1 or later). These versions likely depend on the larger 64-bit mode register set to gain full performance on Core 2 Duo. With 3GB RAM on Core 2 Duo, you have plenty to get an advantage from the 64-bit OS, so you can try both 32- and 64-bit MKL builds, both 8.1 and 9.0, if you care to investigate this.
Many variables would impact this. On some of my tests on Core 2 Duo, the different thresholds for invoking threading within MKL brought about large performance differences between MKL 8.1 and 9.0, and between 32- and 64-bit. MKL functions which aren't threaded evidently aren't subject to abrupt changes in performance due to thread number selection.
Valued Contributor I
Performance difference is most likely due to the FFT. I have noticed that FFT performs faster in 64-bit version of IPP than in 32-bit one. I am talking about the same version of library and same machine here.