Hi Jim, Vladimir,
Sorry for some delay, it looks like I was probably affected by
DPD200134977 C++, Fortran issue with libiomp5md.lib, reduction -:
In rare cases my application had (rare!) random hangs inside one of OpenMP constructs. It occurred if libiomp5md.dll 5.0.2011.325 was loaded, while exactly the same application worked just fine with libiomp5md.dll 5.0.2011.606. In both cases the rest of application remained untouched.
I performed the required test with Intel VTune Amplifier XE 2011. I believe the results show some basic difference between these two versions of libiomp5md.dll, in particular, how aggressively OpenMP threads are utilizing available CPU cores.
For the reference: tests were performed at the computer with 2 X5690 CPU, HT = On, SpeedStep and TurboBoost = OFF. Therefore 24 logical cores are available in this system. In both cases only libiomp5md.dll was replaced. OpenMP is used in a DLL, this DLL is loaded by GUI application written in Delphi. GUI part is also multithreaded, typically DLL functions are called from different secondary threads of GUI in order not to block UI during computations. I tested rather complicated algorithm requiring to call many different DLL functions (with OpenMP parallel constructs) in some externalloop. The test run takes about 20-30 sec.
A) libiomp5md.dll 5.0.2011.325 Summary of Lightweight Hotspots analysis
Elapsed Time: 22.565s
CPU Time: 336.113s
Instructions Retired: 599,580,000,000
CPI Rate: 1.938
Paused Time: 0s
B) libiomp5md.dll 5.0.2011.606 Summary of Lightweight Hotspots analysis
Elapsed Time: 29.971s
CPU Time: 88.789s
Instructions Retired: 143,308,000,000
CPI Rate: 1.874
Paused Time: 0s
Immediately we can see significantly less CPU Time parameter in the 2nd case!
Do you still need more detailed information on particular functions inside libiomp5md.dll?
At the moment I have a feeling that version 606 doesnt like when OpenMP parallel region is called by different threads of the application, like happens in my case.
In my case instances of OpenMP are never running concurrently. They are launched sequentially one by one. Parent threads of GUI applications are often different, but they are synchronized in sequential order. Thus I do not see any need to use KMP_AFFINITY.
I checked values ofomp_get_num_threads,omp_get_max_threads, omp_get_thread_limit,omp_get_dynamic, kmp_get_blocktime,omp_get_schedule
All these functions return the same values for both versions of libiomp5md.dll.
>> Are you saying the GUI part is coded using say pthreads, or beginthread, or ... (non-OpenMP thread),
Yes, some wrapper around beginthread/endthread (non-OpenMP).
>> and which each of these threads may concurrently call the DLL,
Not at all! By design I am avoiding concurrent calls of DLL functions having OpenMP parallel constructs. These calls are serialized. Other functions not having OpenMP inside are called, of course.
>> and where each DLL called function will use OpenMP under the assumption that the OpenMP thread pool is entirely owned (by its call context)?
Yes, it is designed in this way. I was assuming that serialization (see answer #2) makes this possible.
>> If so, you should understand that OpenMP is not designed to operate this way.
Some time ago I really had a problem when by accident (programming error) I had a concurrent call to two DLL functions with OpenMP. It caused stability problems, after I serialized such calls the problem gone and I have no problems with correct results, stability, etc.
Are there any recommendations how to use OpenMP in DLL for my scenario? If yes, where can I find a description?
>> Externally, observed by your application, potentially you have one thread in each of these concurrent DLL calls at it's independent main level with omp_get_thread_num() == 0. Would this cause any programming errors?
Even with a single thread for every parallel region the program should work correctly with an obvious impact on performance.
kmp_get_blocktime() returns 200).
Thus there is no difference in this parameter, it cannot be a reason for slower execution.