Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28543 Discussions

OpenMP code works fine in gfortran, fails in ifort

Niemeyer__Kyle
Beginner
505 Views

I have an application that I recently parallelized using OpenMP, and have it working fine with gfortran (with and without optimization, the works). This code has lots of legacy f77 code with SAVE and COMMON blocks, so I needed to use THREADPRIVATE liberally.

It looks like using ifort instead will result in a much faster runtime, but the code when compiled with ifort doesn't run properly. If I change the compiler to back to gfortran, it runs exactly as it should (reproducing the serial results exactly), but for some reason the OpenMP-enabled ifort version doesn't work correctly (the serial results are fine).

The code is way too complex to post here, but does anyone have any suggestions on how to figure this out? Why would the same code work fine with gfortran? I'm not getting a segfault or stack overflow or anything like that, the results just aren't correct—it looks like the threads are interfering with each other somehow, even though this doesn't happen when using gfortran.

Thanks!

0 Kudos
6 Replies
Niemeyer__Kyle
Beginner
505 Views

I'm trying to use the Intel Inspector to figure out where there might be problems, but I'm getting some results that don't make sense to me.

As I mentioned, some of the legacy code has SAVE in various subroutines, so I made all the variables in those subroutines THREADPRIVATE. However, the inspector is telling me that there is a data race where a variable in one of these functions is being written and then read.

How could that be, if the variable was declared THREADPRIVATE? This particular variable is an integer (LABEL) that is used: [fortran]ASSIGN 6500 TO LABEL[/fortran] then [fortran]GO TO LABEL[/fortran] (like I said, old code)

0 Kudos
Steven_L_Intel1
Employee
505 Views

Ouch - THREADPRIVATE on an assigned goto label? It would not surprise me that the compiler didn't consider that.... I wonder if the OpenMP standard even allows that. We'll check it out.

0 Kudos
jimdempseyatthecove
Honored Contributor III
505 Views

Can you change these to computed go to? (assuming you are not using assign for FORMAT statements)

Jim Dempsey

 

0 Kudos
Niemeyer__Kyle
Beginner
505 Views

Wow—getting rid of those assigned go tos (replacing with switch statements) absolutely fixed the problem. Now, the ifort-compiled code generates the correct results. Thanks so much!

I am still getting a lot of data races reported by the inspector, but they seem to be focused on WRITE statements (and I have made sure that different threads access different files, based on their thread id).

0 Kudos
jimdempseyatthecove
Honored Contributor III
505 Views

Glad to see you used select/switch inplace of computed goto.

>> based on their thread id).

Careful: omp_get_thread_num() "Returns the thread number of the calling thread, within the context of the current parallel region"

Meaning if you have nested parallel regions, you will have duplicate sets of numbers. Effectively omp_get_thread_num() means get team member number. With non-nested regions you will have unique IDs, with nested regions you will (may) have collisions (if ID used from lower regions).

What I usually do in this circumstance is specify a thread private variable for ID number, with default value of -1. Add global volatile variable for nextID number initialized to 0. Then when thread needs ID number it tests its thread local ID number, if -1, then it obtains a next ID number (use critical section to copy next ID number to thread local ID number and incriment next ID number). You can also use __sync_fetch_and_add or InterlockedExchangeAdd to atomically obtain next ID and advance next ID.

Jim Dempsey

 

0 Kudos
Niemeyer__Kyle
Beginner
505 Views

Thanks Jim—I should have been more precise in what I said.

What I have is a PARALLEL DO loop, and the id I'm using for my files (so that files are something like "file.1.inp", etc) is based on the loop counter. I'm not actually using the omp_get_thread_num(), but I will keep what you said in mind.

0 Kudos
Reply