Re: Poor performance on Nehalem processor

Frits_Schalij · ‎01-13-2010

I extensively use a Modelcheck application. My Modelcheck run in general takes
several minutes up to some hours. In order to have shorter run times I moved
to a faster Intel processor: I had an Intel E5440 @ 2.83 GHz, now I have an
Intel E5520 @ 2.27 GHz (Nehalem). Both computers have 4 GB installed memory.

I expected to get a big performance improvement but I got a performance
degradation of about 20%.

My hypothesis is that my code doesnot make use of the advanced features
of the Nehalem processor, so I compiled my application with the Intel ICPC
compiler. (Before it was compiled with the GCC compiler.) I tried several
flags and combinations of flags of the ICPC compiler, such as -fast,
-msse4. -64 -openmp -parallel -xSSE4.2, -ipo, -ip, -hio, but nothing made
the application faster. I expect that my application still makes no use of
the advanced features of the Nehalem processor.

My question is: How do I have to compile my application such that I get a
performance gain on the E5520 processor?

By the way, my application makes scarcely use of floating operations.

Dale_S_Intel · ‎01-13-2010

Well, of course it is difficult to say without seeing the application, but the first thing to try (if you're building on the same machine on which you're running) would be -xHOST. That should make sure you're taking advantage of whatever features that processor provides. One other thing to try would be profile guided optimization (PGO), but that's a little more complicated. Other than that, it's difficult to say without further information. You could try doing some performance analysis with VTune or PTU (Assuming this is Linux. For Mac you'd need Apple performance analysis tools like Shark).

Dale

aazue · ‎01-13-2010

Quoting - Frits Schalij

I extensively use a Modelcheck application. My Modelcheck run in general takes
several minutes up to some hours. In order to have shorter run times I moved
to a faster Intel processor: I had an Intel E5440 @ 2.83 GHz, now I have an
Intel E5520 @ 2.27 GHz (Nehalem). Both computers have 4 GB installed memory.

I expected to get a big performance improvement but I got a performance
degradation of about 20%.

My hypothesis is that my code doesnot make use of the advanced features
of the Nehalem processor, so I compiled my application with the Intel ICPC
compiler. (Before it was compiled with the GCC compiler.) I tried several
flags and combinations of flags of the ICPC compiler, such as -fast,
-msse4. -64 -openmp -parallel -xSSE4.2, -ipo, -ip, -hio, but nothing made
the application faster. I expect that my application still makes no use of
the advanced features of the Nehalem processor.

My question is: How do I have to compile my application such that I get a
performance gain on the E5520 processor?

By the way, my application makes scarcely use of floating operations.

Hi
Intel E5440 @ 2.83 GHz is very very nice machine with that you must take improve, i think the slower side problem is OpenMp.
Disable all specific pragma concerned (if you can) and make an new test without.
Kind regards

Alexander_C_Intel · ‎01-13-2010

Do you have hyperthreading (SMT) enabled on your Nehalem machine?

Some applications run slower when it is on. Try to set OMP_NUM_THREADS to the number of physical cores you have.

TimP · ‎01-14-2010

Quoting Frits Schalij

Do you have hyperthreading (SMT) enabled on your Nehalem machine?

Some applications run slower when it is on. Try to set OMP_NUM_THREADS to the number of physical cores you have.

If you are running past versions of Windows (even Vista), disabling HyperThreading (in BIOS setup screen) may be the way to make this work. If you do keep HyperThreading enabled, it may be important to set KMP_AFFINITY values, along with OMP_NUM_THREADS, which keep threads assigned consistently to separate cores but on the same package/socket.

You can read about KMP_AFFINITY=scatter (and physical, which works on certain platforms) in the help menu.

Of course, it is possible that your application ran perfectly in cache on Xeon 54xx, does not benefit from the new Nehalem cache and memory organization, and so the reduction in clock speed is reflected in your performance.

Om_S_Intel · ‎01-20-2010

You may try profiler (e.g VTUne, Intel Thread profiler) to analyse the issue. It would be nice if you can share the testcase with us.

Ianir_Ideses · ‎02-15-2011

Hi,
I am also seeing some degradation when moving to Nehalem hex core CPUs.
I am using VTune to analyze this, what are your suggestions for measurement params and events ?
Thanks,
Ianir.