Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6956 Discussions

performance difference in 32 bit mode and 64 bit mode?

bigbearking
Beginner
356 Views

Someonetold me he ovserved a 30% DGEMMperformance difference with MKL on the SAME core 2 duomachine with 32-bit and 64-bit OSes installed. I am not sure if he used the latest version of MKL.I am really curious to know if it is true since we are buying some 64-bit mahcines and we need tochoose between64-bitand 32-bit Linux. We don't plan to get more than 3G memory.My understanding is the difference hereis mainly those additionalxmm and general pupose registers. But can they really help in a computation-intensive kernelsuch asDGEMM which already gave close to peak performance on IA32 machines?How aboutother kernels such as FFT and level-2 blas?Let's assume large array sizes.

I only have access to P4 and Itanium 2 right now so I can' conduct any experiments. Anyone has such an experience?

0 Kudos
2 Replies
TimP
Honored Contributor III
356 Views
I agree that such a large difference is unlikely to be due directly to the OS. Possibly, there might be such a difference between the 32-bit MKL and the 64-bit MKL which is optimized for Core 2 Duo (8.1 or later). These versions likely depend on the larger 64-bit mode register set to gain full performance on Core 2 Duo. With 3GB RAM on Core 2 Duo, you have plenty to get an advantage from the 64-bit OS, so you can try both 32- and 64-bit MKL builds, both 8.1 and 9.0, if you care to investigate this.
Many variables would impact this. On some of my tests on Core 2 Duo, the different thresholds for invoking threading within MKL brought about large performance differences between MKL 8.1 and 9.0, and between 32- and 64-bit. MKL functions which aren't threaded evidently aren't subject to abrupt changes in performance due to thread number selection.
0 Kudos
levicki
Valued Contributor I
356 Views
Performance difference is most likely due to the FFT. I have noticed that FFT performs faster in 64-bit version of IPP than in 32-bit one. I am talking about the same version of library and same machine here.
0 Kudos
Reply