- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
several minutes up to some hours. In order to have shorter run times I moved
to a faster Intel processor: I had an Intel E5440 @ 2.83 GHz, now I have an
Intel E5520 @ 2.27 GHz (Nehalem). Both computers have 4 GB installed memory.
I expected to get a big performance improvement but I got a performance
degradation of about 20%.
My hypothesis is that my code doesnot make use of the advanced features
of the Nehalem processor, so I compiled my application with the Intel ICPC
compiler. (Before it was compiled with the GCC compiler.) I tried several
flags and combinations of flags of the ICPC compiler, such as -fast,
-msse4. -64 -openmp -parallel -xSSE4.2, -ipo, -ip, -hio, but nothing made
the application faster. I expect that my application still makes no use of
the advanced features of the Nehalem processor.
My question is: How do I have to compile my application such that I get a
performance gain on the E5520 processor?
By the way, my application makes scarcely use of floating operations.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, of course it is difficult to say without seeing the application, but the first thing to try (if you're building on the same machine on which you're running) would be -xHOST. That should make sure you're taking advantage of whatever features that processor provides. One other thing to try would be profile guided optimization (PGO), but that's a little more complicated. Other than that, it's difficult to say without further information. You could try doing some performance analysis with VTune or PTU (Assuming this is Linux. For Mac you'd need Apple performance analysis tools like Shark).
Dale
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
several minutes up to some hours. In order to have shorter run times I moved
to a faster Intel processor: I had an Intel E5440 @ 2.83 GHz, now I have an
Intel E5520 @ 2.27 GHz (Nehalem). Both computers have 4 GB installed memory.
I expected to get a big performance improvement but I got a performance
degradation of about 20%.
My hypothesis is that my code doesnot make use of the advanced features
of the Nehalem processor, so I compiled my application with the Intel ICPC
compiler. (Before it was compiled with the GCC compiler.) I tried several
flags and combinations of flags of the ICPC compiler, such as -fast,
-msse4. -64 -openmp -parallel -xSSE4.2, -ipo, -ip, -hio, but nothing made
the application faster. I expect that my application still makes no use of
the advanced features of the Nehalem processor.
My question is: How do I have to compile my application such that I get a
performance gain on the E5520 processor?
By the way, my application makes scarcely use of floating operations.
Intel E5440 @ 2.83 GHz is very very nice machine with that you must take improve, i think the slower side problem is OpenMp.
Disable all specific pragma concerned (if you can) and make an new test without.
Kind regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have hyperthreading (SMT) enabled on your Nehalem machine?
Some applications run slower when it is on. Try to set OMP_NUM_THREADS to the number of physical cores you have.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have hyperthreading (SMT) enabled on your Nehalem machine?
Some applications run slower when it is on. Try to set OMP_NUM_THREADS to the number of physical cores you have.
If you are running past versions of Windows (even Vista), disabling HyperThreading (in BIOS setup screen) may be the way to make this work. If you do keep HyperThreading enabled, it may be important to set KMP_AFFINITY values, along with OMP_NUM_THREADS, which keep threads assigned consistently to separate cores but on the same package/socket.
You can read about KMP_AFFINITY=scatter (and physical, which works on certain platforms) in the help menu.
Of course, it is possible that your application ran perfectly in cache on Xeon 54xx, does not benefit from the new Nehalem cache and memory organization, and so the reduction in clock speed is reflected in your performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am also seeing some degradation when moving to Nehalem hex core CPUs.
I am using VTune to analyze this, what are your suggestions for measurement params and events ?
Thanks,
Ianir.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page