Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Threadsafe and performance issues

netphilou31
New Contributor II
1,841 Views

Dear all,

Last year I developed a dll with intel Fortran compiler 19.0.3.203 (XE2019 update 3) which is called from a C++ application (C++ EXE and dlls). To get good performances and a correct multithread (i.e. thread safe) behavior I found that I needed to add the /Qopenmp compiler switch along with /threads and /reantrancy:threaded. I was surpprised to need the /Qopenmp compiler switch because my dll do not use any OpemMP feature, but anyway it worked as expected. I guess I needed to use it for multithreads calls.

The compiler switches I used were :

/nologo /O3 /fpp /DMIXT /reentrancy:threaded /extend_source:132 /Qopenmp /Qauto /align:sequence /fpe:1 /fp:fast=2 /fpconstant /Qfp-speculation=strict /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc160.pdb" /libs:static /threads /c

Today, I have switched to oneAPI version (2021.1.1.191) and found that the performances are now very poor (5-6x the calculation time in sequential calls) and the dll is almost unusable in multithread application ( more than 20' instead of a couple of seconds previously). This problem has also been found with other compiler versions : 19.1.2.254 and 19.1.311, i.e. XE2020 updates 2 and 4.

To solve part of the problem, I found that I have to remove the /Qopemmp compiler switch to get correct performances in sequential calls but now the multithread calls are not working anymore (it seems that I am getting race conditions, as intel inspector says).

Is there anything I can do to fix this problem, or have you some information about such problems?

 Best regards,

 
 
0 Kudos
14 Replies
jimdempseyatthecove
Honored Contributor III
1,833 Views

Try replacing /Qopenmp with /recursive

In the pre-oneAPI /Qopenmp also enabled /recursive.

I have no input regarding poor performance between versions.

You should be aware that the oneAPI comes with two versions of Fortran ifort (a.k.a. classic) and ifx (oneAPI).

Jim Dempsey

0 Kudos
netphilou31
New Contributor II
1,817 Views

Hi Jim,

Thanks for the advice, I will try this (I am using the compiler classic).

Best regards,

0 Kudos
netphilou31
New Contributor II
1,815 Views

@jimdempseyatthecove , I tried /recursive (in the command line window) as well as /assume:recursion without more success on multithreading.

Best regards,

 
 
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,794 Views

Which compiler are you using? ifort or ifx?

Jim

0 Kudos
netphilou31
New Contributor II
1,762 Views

Hi Jim,

I am using ifort (but the problem also shows up in previous versions, at least in versions 19.1.2.254 and 19.1.3.311).

Best regards,

 
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,747 Views

>> I tried /recursive (in the command line window) as well as /assume:recursion without more success on multithreading.

Then something else is at play with your coding. Possibilities:

1) You have procedure variables attributed with SAVE when they should not be SAVE. Please note, should you have a procedure that has once-only code (e.g. to setup a table), then the entry into the once-only code must be written safely to be once-only. IOW only one thread performs the initialization, all other threads (if necessary) wait for initialization to complete.

2) The older code had coding errors that could produce race conditions, but... did not, or rarely occurred such that you were not made aware of this condition. Whereas the newer compiler generates a code path and sequencing such that the error condition is encountered.

3) Adding the /Qopenmp may cause a different library to be selected for some portions of your non-OpenMP code. IOW your code is sensitive to different versions of library functions.

4) compiler bug? (it is easy to blame the compiler, most of the time is is programmer's error)

Finding the cause of the problem can be time consuming and is somewhat of an art. With practice you get better.

Jim Dempsey

0 Kudos
netphilou31
New Contributor II
1,718 Views

Hi Jim,

Still no progress with this issue.

I have found the same problem under Linux. Last year, when I build the windows dll, I also build a dynamic library under linux (.so) with the same source code and the XE2019 compiler version (Ubuntu 18.04). Both libraries worked perfectly with test programs, but today with the XE2020 version of the compiler for Linux, I got similar issues as the one I got under Windows. I don't know if this a link problem with the right library or a setup problem (I found that assigning the compiler vars using the command "source /opt/intel/bin/compilervars.sh intel64" generates bash error messages).

Best regards,

0 Kudos
netphilou31
New Contributor II
1,675 Views

Hi,

Some progress. We have found that compiling with -Qopenmp in oneAPI automatically makes the application running with 12 threads in our computer even if no OpenMP instructions/directives are used. It seems that XE2019 version was not behaving the same way. Forcing the number of threads to 1 makes our performance and Threadsafe tests applications running almost as expected since it seems that there is still some overhead relating to OpenMP. To be more complete, our threadsafe test application is normally creating 12 threads, but we found that 156 threads were created (13*12) instead of only 12.

Best regards,

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,660 Views

In your original post you had listed /threads as an option. This may have also enabled /parallel which enables auto-parallelization. Try removing that option and and then testing for number of threads when compiling without /Qopenmp.

Auto-parallelization, which typically is not used in conjunction with OpenMP, uses OpenMP for thread pool.

Without this switch .AND. using /Qopenmp, then only the !$omp loops are parallelized.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,658 Views

>>our threadsafe test application is normally creating 12 threads, but we found that 156 threads were created (13*12) instead of only 12.

Then this implies

!$omp parallel ...
...
!$omp parallel ...
...
!$omp end parallel
...
!$omp end parallel

IOW you are using nested parallel regions.

A common (newbie) mistake is to have an

!$omp parallel do ... / !$end parallel do

contained within an

!$omp parallel do ... / !$end parallel do

Jim Dempsey

0 Kudos
netphilou31
New Contributor II
1,653 Views

Hi Jim,

The /threads switch is automatically added when selecting multithreaded library from the settings page. Furthermore, as mentioned before, we don't use any OpenMP directives at any time, but we find we need to use the /Qopenmp switch in XE2019 version of the compiler to make the dll thread safe. We have also tried to play with the /parallel switch (w/ or w/o /Qopenmp) but we finally don't use it because it has an enormous impact on the performances.

Best regards,

0 Kudos
Steve_Lionel
Honored Contributor III
1,646 Views

/threads simply tells the run-time library to act thread-safe - which it now does by default. It does not add any parallelism on its own.

0 Kudos
FortranFan
Honored Contributor III
1,643 Views
@netphilou31 wrote:
.. we find we need to use the /Qopenmp switch in XE2019 version of the compiler to make the dll thread safe. ..

This might be a misunderstanding.

I suggest you confirm with Intel Support staff about this.

0 Kudos
netphilou31
New Contributor II
1,625 Views

Hi all,

Finally, we've found that the problem is linked to the use of /Qopenmp along with do concurrent constructs as already posted on this forum (Bug with do concurrent and openmp - Intel Community).

Removing the do concurrent constructs solve the problem when compiling with oneAPI.

Best regards,

0 Kudos
Reply