Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

/Qopenmp when not using OpenMP

netphilou31
New Contributor II
3,145 Views

Hi,

Since I was not sure to post my message at the right place, this is a repost of the thread /Qopenmp when not using OpenMP - Intel Community

I'd like to know the effect of adding the /Qopenmp compiler switch (IFORT) even when I'm not using OpenMP directives at all. 
To illustrate this, I have a dll that needs to be compiled with this switch to be thread safe. The dll works the same with and without it, however, I'm pretty sure some parts of the code are NOT thread safe and strangely seem to behave safely with this option! The calling process does not use critical sections and there are no critical sections in the source code.

Does anyone know what is happening behind the scenes when turning on this compiler switch?

Regards,

Phil.

0 Kudos
1 Solution
Barbara_P_Intel
Employee
3,071 Views

I checked with one of Intel's infamous Fortran compiler engineers and got this info:

The default is /reentrancy:threaded and has been for a couple of releases. This means that we always link against the threadsafe libraries now, for both ifx and ifort.

/Qopenmp sets /Qauto automatically making all variables into stack variables by default, instead of declaring them essentially on the heap.

View solution in original post

0 Kudos
21 Replies
jimdempseyatthecove
Honored Contributor III
2,920 Views

Earlier versions of ifort treated procedure local arrays and user defined types as SAVE when compiled without /Qopenmp and treated as automatic (stack based) when compiled with /Qopenmp. Newer versions treat these variables as automatic (stack based) regardless of /Qopenmp.

Should you be building the application from within MS VS, this might include the OpenMP library at link time (to which nothing should be extracted).

It won't hurt to include this option or /recursive, or /auto

Jim Dempsey

0 Kudos
Steve_Lionel
Honored Contributor III
2,914 Views

It was my understanding that the current version of the compiler requires you to use -standard-semantics to get the default of recursive procedures (and thus stack allocation for all variables.)

0 Kudos
netphilou31
New Contributor II
2,909 Views

Hi Jim,

Thanks for the reply.

Since I am using 2023 version of the compiler, I guess the last part of your comment applies to me.

So, my question changes to: "is the tread safety relying on the /reentrancy:threaded switch I am using?"

and when using several dlls, when the master one (one dll to master them all ;-)) is compiled with this option, the thread safety is ensured?

Regards,

Phil.

0 Kudos
Barbara_P_Intel
Employee
2,880 Views

According to the Fortran DGR (Developer Guide and Reference), when you compile and link with /Qopenmp, threadsafe and/or reentrant runtime libraries are linked in. Another difference. 

 

0 Kudos
Barbara_P_Intel
Employee
3,072 Views

I checked with one of Intel's infamous Fortran compiler engineers and got this info:

The default is /reentrancy:threaded and has been for a couple of releases. This means that we always link against the threadsafe libraries now, for both ifx and ifort.

/Qopenmp sets /Qauto automatically making all variables into stack variables by default, instead of declaring them essentially on the heap.

0 Kudos
netphilou31
New Contributor II
2,846 Views

Hi,

Thanks for your comments. I understand that using /reentrancy: threaded w/ or w/o /Qopenmp forces the build to link with threadsafe libraries. However, if one dll uses these settings and assuming that the code does not contain global or saved variables, i.e., the dll itself is supposed to behave threadsafely, what about other called dlls which are built without these settings? I have another dll which uses massively global variables, and which is not built with either /reentrancy: threaded or /Qopenmp switches and called from the previous one and seems to be threadsafe (while I assume it shouldn't be).

Best regards,

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,837 Views

Any DLL that is not thread-safe, when called concurrently from a threaded program, may experience issues (when they use static variables). If you must use those libraries from within a parallel region, then bound the call within a critical section. Give each such dll a different critical section name.

 

!$omp critical(foo_critical)
call foo(bar)
!$omp end critical(foo_critical)

 

Doing so will provide for each such call to permit concurrent access by one (different) thread at a time.

Jim Dempsey

 

0 Kudos
netphilou31
New Contributor II
2,827 Views

My "thread-safe" dll1 is not implementing any !$omp directives, however the calling (test) application (not written in Fortran) is creating several threads and makes calls to dll1 with different sets of calculation conditions for each thread; dll1 in turn calls dll2 (not thread-safe). We are trying to find collisions or race conditions during this process but are totally unsuccessful while dll2 should show this issue. Any idea?

Regards,

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,814 Views

The use of /Qopenmp was a quick and easy way to assure all local variables default to being stack based (thus thread-safe), using /auto would likely have had the same effect as your code was void of !$omp ... directives.

If it is unknown, for each procedure in dll2, as to if it is thread-safe, then it would be best to encapsulate each call in a named critical section (name unique to each procedure). Running a test to see if the code fails cannot tell you if the code is thread-safe should your test fail to fail (indicate collision).

While you could, say, produce an MD5 checksum of all memory exclusive of passed arguments and stack, call a procedure in dll2, then on return rerun and check the checksum and verify with earlier checksum, you could not assure yourself that some exception case not used in your test call would be thread-safe.

What do the functions in dll2 do?

Can you replace dll2 with something else or rewrite all/some of the procedures? 

Will encapsulating them in critical sections introduce excessive overhead?

Jim Dempsey

0 Kudos
netphilou31
New Contributor II
2,789 Views

Hi Jim,

Thanks for your comments.

I am using /Qauto and /Qopenmp in dll1 but only /Qauto in dll2. Does it mean almost the same if none of these dlls include !$omp directives? and what about module variables (global variables), are they also put on the stack rather than on the heap because of /Qauto, meaning that dll2 should be thread-safe?

For dll2, I have already planned to define critical sections in dll1 when dll1 knows dll2 will be called but if dll2 is thread-safe they are not necessary, and to answer your question, dll1 and dll2 are performing thermodynamic calculations, dll1 contains a lot of different models and algorithms and dll2 contains another specific one we did not code ourselves (this dll is shared with some partners who are the original coders of it).

Regards,

Phil.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,774 Views

It is NOT the use of the !$omp directives. Rather it is the placement of the local arrays and local user defined types. In the older versions of Intel Visual Fortran (and possibly other vendors versions of Fortran), the default placement of procedure (local) arrays and user defined types (which may be large) were made SAVE as opposed to being stack (aka automatic), scalar local variables were stack based. Newer versions of Fortran may default these to stack. The reasoning for this is that locally declared arrays (and UDT's) can be quite large, and in a pre-64-bit/pre-OpenMP world, this would cause stack overflow. So, the helpful placement would be SAVE as it would not adversely affect the outcome of the results while avoiding stack overflow in a serial program environment.

Therefore, IIF you do not know what compiled the code, or what compile options were used, you cannot assume the code is thread-safe. 

I assume that you do not have the source code available (as you would be able to recompile the code).

One recourse you have is to look at the disassembly of all procedures in dll2 to see if any memory write references do NOT include esp/rsp nor ebp/rbp (.or. registers use for reference were not produced using esp/rsp nor ebp/rbp) as these would be static variable references (out of stack read references are OK). You can see how this may be error prone.

 

NOTE

In many instances of old code, written for a serial environment, that intermediary results data be passed in COMMON blocks as opposed to as arguments on CALL. These common block variables are static variables, and (when written to) would make the code thread-unsafe.

Jim Dempsey

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,772 Views

NOTE 2

You do have an option to try. This will take some work, but not an insurmountable amount of work.

Write your application as an MPI application. Where rank-0 runs what you code as your main thread, and ranks 0:n run what you coded as the parallel regions. In this manner, each rank (formerly thread) is in a different process and it would be safe for it to call dll2 as each rank has different process addresses.

Depending on where you induce the parallelization, this could be a simple job to do. IIF the parallelization is above the call to dll1 (which calls dll2), then the conversion might be trivial.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,757 Views
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,755 Views

Note on MPI example:

In that example, only one variable of the boundary, is passed between ranks. In your case, you may want to pass a slice of the adjacent cells in an array. You will have to determine how to best slice the 3D array to minimize the frequency of and amount of data transferred between ranks. You would want to pass a contiguous section of the larger array.

Jim Dempsey

 

 

0 Kudos
netphilou31
New Contributor II
2,697 Views

Hi Jim,

Thanks a lot for your comments and advices. As I have absolutely no experience in MPI programming, this will require some training and trials. I will take a look at the links you provided. To come back to dll2 comments, I have the source code, even if I just recompile it when needed, but the Fortran compiler settings used for this dll do not include /reentrancy: threaded or /Qauto, so I was wondering how it could be tread safe while using a lot of module global variables Our tread safe test app is running several threads in parallel, each of them doing hundreds or calculation loops with different calculations sets (each time a calculation is performed the dll global variables are almost fully reinitialized with each data set) and I was surprised that no race conditions happened.

Regards,

Phil.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,644 Views

As you have the source code, you will have to look at the code. Do NOT rely on any race condition detector for conflicts as it is not 100% effective.

The module data in dll2 may, (you will have to closely look at this) be declared threadprivate

Note, the example in the link provided shows COMMON block data, threadprivate can apply to any static data be in COMMON, module, of procedure SAVE. Read the text, not just the example.

Caution, blindly making everything in dll2 threadprivate may or may not work. You must look at the code.

 

Barbara, if you read this. Add to the documentation an example of module data.

Also, in the example given, it is somewhat confusing.

/BLK1/, not shown is presumed to be .NOT. threadprivate outside the scope of the parallel region. A copy of it will be made private local the parallel region.

 

Jim Dempsey

0 Kudos
Barbara_P_Intel
Employee
2,596 Views

Regarding COMMON blocks and THREADPRIVATE, I just added this known issue to the Fortran Release Notes that will be published later this month with the release of oneAPI 2023.2. This applies to ifx and ifort.

Programs that pass a COMMON block, instead of individual COMMON block variables, to a OpenMP data sharing clause cause a runtime failure, i.e. segmentation fault or incorrect result. The workaround is to pass the individual COMMON block variables.

 

0 Kudos
netphilou31
New Contributor II
2,589 Views

The dll2 source code contains no common blocks, only global variables declared in modules and no threadprivate instructions, but it is compiled with /reentrancy: threaded, so the opposite of what I thought. Is it enough to make it thread safe even with global variables in modules (even if it uses /Qsave, Qinit/:zero and /Qinit:arrays) ?

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,582 Views

IIF within dll2

     procedure foo, defines (sets) variables in module visible to foo, then calls procedure bar (also within dll2)

    .and. procedure bar uses said variables defined by foo (iow call data is passed via module variables as opposed to via arguments)

Then this code will not be thread safe as all threads will be using the same variables/addresses to pass intermediary data.

Making such variables threadprivate may resolve the sharing issue.

*** you must investigate as to if this may cause other issues that must be resolved with coding changes.

For example, dll2 may contain tally counts, or perform I/O, or ???

 

Jim Dempsey

0 Kudos
netphilou31
New Contributor II
2,578 Views

This is exactly what I thought, so it seems we were very lucky with our test app (or maybe the calculations performed in dll2 are too fast to show race conditions, no I/Os and, even if not absolutely sure, no tally counts). Side question, I noticed you are often using "IIF" in your messages, at the beginning I thought it was a typo error but not sure now.

0 Kudos
Reply