Newbie problem with compiler options / OpenMP

marco_ganz · ‎01-20-2004

Hello!

I'm new to threading, hyperthreading and OpenMP so please forgive me if my questions have "obvious" answers.

I'm tyring to add multithreading support to my Monte Carlo application in order to exploit the 8 CPUs (4 physical x2 by HyperThreading) in my new "CPU Server". I'm compiling with the newest Intel C/C++ compiler within the VS.NET 2002 IDE.

Myapplication relies almost completely on thread specific dynamically allocated memory (paths length and number is selected at runtime - different threads process different bunches of paths, independent one from the other). Global results are summarised by "reduction" clauses. Input parameters are shared between trades.

Question #1: in order to overcome heap locking issues I've replaced all "new" and "delete"operators by calls to the special "kmp_calloc" and "kmp_free" functions. Is that correct / wise / suggested / useless?

Results seem quite promising in debug mode (I see my 8 processors running at 100% and total execution time is roughly 1/5th compared to single thread execution) BUT as soon as I recompile in "Release" mode performances get MUCH WORSE: CPU still at 100% but execution time explodes!

Question #2: are there optimizations which should be avoided when compilingmultithreaded code?

Any suggestions?

Thank you very much!

-- Marco --

marco_ganz · ‎01-21-2004

Yesterday after posting my message I made a few changes to my code and suddenly the perf problem disappeared. After undoing the changes (at least those I'm aware of) perfs are still good so I can't reproduce the problem I observed yesterday.

Cheers

-- Marco --

ClayB · ‎01-22-2004

Marco -

Um..., Happy to be of service (I guess).

I'm not aware of any optimizations that might directly affect threading. At some point, it all boils down to how well code executes on a processor (even in distributed applications), so better optimized code should perform better overall. You might see some drop off with Hyper-Threading, though. As the code becomes tighter with better or more aggressive optimizations, there will tend to be fewer unused processing resources and, hence,less opportunity for the scheduler to execute instructions from a second thread. We've seen this with highly optimized math libraries where adding threads on Hyper-Threading enabled systems yields little if any extra performance.

-- clay

dolom · ‎01-23-2004

Hi Clay!

Still me, but from a different account.

Thank you for your useful answer: I suspect my troubles were caused by a "dirty" recompilation since I didn't notice any glitches afterwards.

Your comments about HT sound reasonable and we've indeed noticed limited gains on simple highly optimised apps.

On the other hand we're quite "lucky" since most of our code is based on Monte Carlo simulations and is not (yet) highly optimised (e.g. we don't use intrinsic functions or other INTEL-specific features) so it's parallel by definition and can easilly profit from OpenMP directives thus remaining portable to other platforms.

Cheers

-- Marco --