Performance of OpenMP Critical Sections

Matt_F_1 · ‎09-22-2008

Adding another named OMP CRITICAL section seems to have slowed down my code by roughly 15% in serial (OMP_NUM_THREADS=1), even though the new critical section is never executed.

I spent some time tuning this parallel code and then almost as an afterthought added CRITICAL sections to the error handling routines to make sure the error/warning messages were output correctly without multiplexing between threads. The new critical sections all have the same name as they all reference the same log file. Unfortunately, although these new critical sections are never executed during a normal run they have a significant impact on runtime for a single thread. With the maximum number of threads the impact is currently negligible due to bus related stalls, but this will hopefully be remedied by Nehalem.

Evidently adding another named section is slowing down all of the other named critical sections? Is this to be expected, and does anyone have any suggestions as to what can be done?

Thanks,
Matt

jimdempseyatthecove · ‎09-22-2008

Matt,

You may have a cache line alignment issue that is exposed when adding the additional named critical section.

VTune/PTU might be able to identify the cache line alignment issue.

Also, check to see if same optimization and runtime checkes are the same for both runs.

Jim Dempsey

Matt_F_1 · ‎09-22-2008

Wouldn't any such issue be inside the Intel OpenMP runtime?

The only change I made to my code was to add (or remove) "!$OMP CRITICAL (MYNAME)" around a block of existing code, where the name for the critical section is new. This new section brings the total number of 'names' to 5. Same compile flags, etc.

Or perhaps you're suggesting that the "hidden" code from the critical section is leading to an alignment issue with the existing code? Note that none of the code in the new critical section is ever executed. Due to the nature of OMP this might be difficult to analyze...

I'll try to find time to look at it with VTune.

Thanks,
Matt

jimdempseyatthecove · ‎09-25-2008

Quoting - matt.fago@itt.com

Wouldn't any such issue be inside the Intel OpenMP runtime?

The only change I made to my code was to add (or remove) "!$OMP CRITICAL (MYNAME)" around a block of existing code, where the name for the critical section is new. This new section brings the total number of 'names' to 5. Same compile flags, etc.

Or perhaps you're suggesting that the "hidden" code from the critical section is leading to an alignment issue with the existing code? Note that none of the code in the new critical section is ever executed. Due to the nature of OMP this might be difficult to analyze...

I'll try to find time to look at it with VTune.

Thanks,
Matt

Matt,

It is not the amout of code added that I refered to. Instead it is the alignment of the data within the application may shift by the size of the data structure used by the critical section. For example, prior to adding the new named critical section you may have had an array of REAL(8) aligned on 8 byte boundary, but afterwards it is aligned on a 4 byte boundary (that is not an 8 byte boundary). (and array within a user defined type may experience this). Although the code is never executed the data is not positioned on unfavorable boundaries. Another alignment issue is the SSE instructions can perform small vector operations at different ratesdepending on alignment of data. What used to take one memory read, may not take two memory reads. You could try adding an additional named critical section (or 2 or 3, ...) to see how the code performs.

If you can identify what variables are having the alignment problem then you may be able to use compiler directives to force alignment of those variables.

Jim Dempsey