- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have strange problem where the same program runs faster on a i5-3320M than an Xeon w3680. The main part of the code is FFT. Could anyone shine some light on me? Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Presumably, you're using MKL which gains a big advantage from AVX on the newer CPU.
Maybe you didn't check out settings which might optimize thread placement across the 4 cache channels of the Westmere CPU, e.g. KMP_AFFINITY="proclist=[1,3-5],verbose,explicit" with MKL_NUM_THREADS=4 and HyperThreading disabled (just guessing, according to a typical BIOS numbering of the cores). I suppose, with HT enabled, it might be proclist=[3,7,9,11] or some such. HT wasn't designed for simplicity, nor was there a consistent pattern in early BIOS for it.
If you did so, some specifics of your results might help you get more intelligent responses. The quirky cache setup of the 6-core Westmere probably contributes to limited popularity (and the fact that I have one left to me as a retirement gift).
Also, you should check the MKL docs for any recommendations such as 32-byte data alignment (at least as important on older core-I7 CPUs as the newer one).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is problem size? and
What is performance difference you are talking about?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page