Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Xeon w3680 runs slower than i5-3320M

Bo_Q_
Beginner
1,061 Views

Hi,

I have strange problem where the same program runs slower on a i5-3320M than an Xeon w3680. The main part of the code is FFT. Could anyone shine some light on me? Thanks!

 

 

 

 

0 Kudos
1 Solution
TimP
Honored Contributor III
1,061 Views

I've spent a fair amount of time posting about why the new 2-core CPU might match performance of the old 6-core, but my posts all vanished.

1.  An MKL optimized for AVX could achieve double the performance per thread of the older CPU.

2.  The older CPU may be as dependent (or more so) on 32-byte data alignment, particularly as you are probably attempting more threads.

3.  The 6-core Westmere is likely to be more dependent on optimizing NUM_THREADS and affinity.  Although MKL should attempt automatically to use 1 thread per core in spite of HyperThreading, I haven 't seen any software which deals automatically with the asymmetric arrangement of Westmere 6-core.  Under the usual BIOS arrangement where cores 0 and 1 share cache access, likewise cores 2 and 3, while cores 4 and 5 don't share paths to cache, try something such as

set OMP_NUM_THREADS=4

set KMP_AFFINITY="proclist=[3,7,9,11],explicit,verbose"

(if you disabled HT, [1,3,4,5] would use the same cores as the above settings for HT enabled)

verbose is to get the confirmation of affinity settings echoed to the screen.

By tacking on 2 additional cores, WSM 6-core typically gains more than 20% performance over the equivalent 4-core CPU at the same clock speed, provided that the affinity requirements are observed.  You would have to read the ads carefully to see that 50% gain isn't expected even for applications with good threaded scaling.   Many of the customers who took the trouble to understand the situation but didn't want to deal with the special affinities chose to buy the 4-core model.

Among the advantages of the newer CPUs are less arcane affinity requirements (although you might not feel that way about Intel(r) Xeon Phi(tm))

Perhaps you didn't find the web search references edifying, but you didn't even mention what you didn't understand.  I noticed that Google doesn't return nearly as many references as Bing or Yahoo.

View solution in original post

0 Kudos
6 Replies
Zhang_Z_Intel
Employee
1,061 Views

Take a look at the comparison of these 2 processors here: http://ark.intel.com/compare/47917,64896

i5-3320M processor is a much newer generation of architectures than Xeon W3680. However Xeon W3680 has 6 CPU cores while i5-3320M has only 2. Cache size on Xeon W3680 is 4x the cache size of i5-3320M. It shouldn't be a surprise that the old server CPU can outperform the newer desktop CPU for some workloads.

 

0 Kudos
Bernard
Valued Contributor I
1,061 Views

>>>The main part of the code is FFT. Could anyone shine some light on me? Thanks!>>>

As @Zhang Z hinted probably older Xeon makes a better use of  available cores( more execution units) and larger cache.

Btw @Bo Q  your thread's title states that Xeon runs slower than i5.

0 Kudos
TimP
Honored Contributor III
1,062 Views

I've spent a fair amount of time posting about why the new 2-core CPU might match performance of the old 6-core, but my posts all vanished.

1.  An MKL optimized for AVX could achieve double the performance per thread of the older CPU.

2.  The older CPU may be as dependent (or more so) on 32-byte data alignment, particularly as you are probably attempting more threads.

3.  The 6-core Westmere is likely to be more dependent on optimizing NUM_THREADS and affinity.  Although MKL should attempt automatically to use 1 thread per core in spite of HyperThreading, I haven 't seen any software which deals automatically with the asymmetric arrangement of Westmere 6-core.  Under the usual BIOS arrangement where cores 0 and 1 share cache access, likewise cores 2 and 3, while cores 4 and 5 don't share paths to cache, try something such as

set OMP_NUM_THREADS=4

set KMP_AFFINITY="proclist=[3,7,9,11],explicit,verbose"

(if you disabled HT, [1,3,4,5] would use the same cores as the above settings for HT enabled)

verbose is to get the confirmation of affinity settings echoed to the screen.

By tacking on 2 additional cores, WSM 6-core typically gains more than 20% performance over the equivalent 4-core CPU at the same clock speed, provided that the affinity requirements are observed.  You would have to read the ads carefully to see that 50% gain isn't expected even for applications with good threaded scaling.   Many of the customers who took the trouble to understand the situation but didn't want to deal with the special affinities chose to buy the 4-core model.

Among the advantages of the newer CPUs are less arcane affinity requirements (although you might not feel that way about Intel(r) Xeon Phi(tm))

Perhaps you didn't find the web search references edifying, but you didn't even mention what you didn't understand.  I noticed that Google doesn't return nearly as many references as Bing or Yahoo.

0 Kudos
Bernard
Valued Contributor I
1,061 Views

 >>> An MKL optimized for AVX could achieve double the performance per thread of the older CPU.>>>

Forgotten to mention this in my post.

0 Kudos
Bo_Q_
Beginner
1,061 Views

Thanks all replies are very helpful! Also, I found setting /QxHost option seems to help as well.

0 Kudos
TimP
Honored Contributor III
1,061 Views

On the Westmere, /QxHost was not always as good as /arch:SSE4.1, but these options will not influence MKL.

0 Kudos
Reply