Last year I developed a dll with intel Fortran compiler 126.96.36.199 (XE2019 update 3) which is called from a C++ application (C++ EXE and dlls). To get good performances and a correct multithread (i.e. thread safe) behavior I found that I needed to add the /Qopenmp compiler switch along with /threads and /reantrancy:threaded. I was surpprised to need the /Qopenmp compiler switch because my dll do not use any OpemMP feature, but anyway it worked as expected. I guess I needed to use it for multithreads calls.
The compiler switches I used were :
/nologo /O3 /fpp /DMIXT /reentrancy:threaded /extend_source:132 /Qopenmp /Qauto /align:sequence /fpe:1 /fp:fast=2 /fpconstant /Qfp-speculation=strict /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc160.pdb" /libs:static /threads /c
Today, I have switched to oneAPI version (2021.1.1.191) and found that the performances are now very poor (5-6x the calculation time in sequential calls) and the dll is almost unusable in multithread application ( more than 20' instead of a couple of seconds previously). This problem has also been found with other compiler versions : 188.8.131.52 and 19.1.311, i.e. XE2020 updates 2 and 4.
To solve part of the problem, I found that I have to remove the /Qopemmp compiler switch to get correct performances in sequential calls but now the multithread calls are not working anymore (it seems that I am getting race conditions, as intel inspector says).
Is there anything I can do to fix this problem, or have you some information about such problems?
Try replacing /Qopenmp with /recursive
In the pre-oneAPI /Qopenmp also enabled /recursive.
I have no input regarding poor performance between versions.
You should be aware that the oneAPI comes with two versions of Fortran ifort (a.k.a. classic) and ifx (oneAPI).
>> I tried /recursive (in the command line window) as well as /assume:recursion without more success on multithreading.
Then something else is at play with your coding. Possibilities:
1) You have procedure variables attributed with SAVE when they should not be SAVE. Please note, should you have a procedure that has once-only code (e.g. to setup a table), then the entry into the once-only code must be written safely to be once-only. IOW only one thread performs the initialization, all other threads (if necessary) wait for initialization to complete.
2) The older code had coding errors that could produce race conditions, but... did not, or rarely occurred such that you were not made aware of this condition. Whereas the newer compiler generates a code path and sequencing such that the error condition is encountered.
3) Adding the /Qopenmp may cause a different library to be selected for some portions of your non-OpenMP code. IOW your code is sensitive to different versions of library functions.
4) compiler bug? (it is easy to blame the compiler, most of the time is is programmer's error)
Finding the cause of the problem can be time consuming and is somewhat of an art. With practice you get better.
Still no progress with this issue.
I have found the same problem under Linux. Last year, when I build the windows dll, I also build a dynamic library under linux (.so) with the same source code and the XE2019 compiler version (Ubuntu 18.04). Both libraries worked perfectly with test programs, but today with the XE2020 version of the compiler for Linux, I got similar issues as the one I got under Windows. I don't know if this a link problem with the right library or a setup problem (I found that assigning the compiler vars using the command "source /opt/intel/bin/compilervars.sh intel64" generates bash error messages).
Some progress. We have found that compiling with -Qopenmp in oneAPI automatically makes the application running with 12 threads in our computer even if no OpenMP instructions/directives are used. It seems that XE2019 version was not behaving the same way. Forcing the number of threads to 1 makes our performance and Threadsafe tests applications running almost as expected since it seems that there is still some overhead relating to OpenMP. To be more complete, our threadsafe test application is normally creating 12 threads, but we found that 156 threads were created (13*12) instead of only 12.
In your original post you had listed /threads as an option. This may have also enabled /parallel which enables auto-parallelization. Try removing that option and and then testing for number of threads when compiling without /Qopenmp.
Auto-parallelization, which typically is not used in conjunction with OpenMP, uses OpenMP for thread pool.
Without this switch .AND. using /Qopenmp, then only the !$omp loops are parallelized.
>>our threadsafe test application is normally creating 12 threads, but we found that 156 threads were created (13*12) instead of only 12.
Then this implies
!$omp parallel ... ... !$omp parallel ... ... !$omp end parallel ... !$omp end parallel
IOW you are using nested parallel regions.
A common (newbie) mistake is to have an
!$omp parallel do ... / !$end parallel do
contained within an
!$omp parallel do ... / !$end parallel do
The /threads switch is automatically added when selecting multithreaded library from the settings page. Furthermore, as mentioned before, we don't use any OpenMP directives at any time, but we find we need to use the /Qopenmp switch in XE2019 version of the compiler to make the dll thread safe. We have also tried to play with the /parallel switch (w/ or w/o /Qopenmp) but we finally don't use it because it has an enormous impact on the performances.
Finally, we've found that the problem is linked to the use of /Qopenmp along with do concurrent constructs as already posted on this forum (Bug with do concurrent and openmp - Intel Community).
Removing the do concurrent constructs solve the problem when compiling with oneAPI.