Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28809 Discussions

Unusual OpenMP behavior on Intel 2017/2018

danielsue
Beginner
1,395 Views

Dear All,

I have a parallelized fortran code which has unusual behavior when compiled using latest intel compiler, with OpenMP enabled. The code can be compiled, but when I run the code, it is extreme slow and the simulation crashes at the first timestep. Absolute something is wrong.

However, the same code has been seriously tested on different platform and I just wonder if I need to do some special configuration when using the latest intel fortran compiler.

Below is my test history:

Linux workstation-1     Intel 2014    Works

Linux workstation-2    GFortran 5   Works

Windows workstation-3   Intel 2013    Works

Linux cluster-4          GFortran 7   Works

Linux cluster-4           Intel2017/2018    Fails

The makefile looks like below. For Intel2017/2018, I changed -fopenmp to -qopenmp.

FC = ifort
FFLAGS = -fopenmp -O3

FPPFLAGS = -DLINUX -DRELEASE -DOPENMP

#Source code folder
SRC =./../
SOURCES = $(SRC)usg/math_common.o\
                       ...
                       ...
$(SRC)min3p/pitzer/closepitzer.o
executable: $(SOURCES)
$(FC) $(FFLAGS) $(FPPFLAGS) -o executable-name $(SOURCES)
 

This is the first time I switch to Intel 2018 compiler and I just wonder if I need special configuration. 

Thanks,

Danyang

 

0 Kudos
11 Replies
Juergen_R_R
Valued Contributor II
1,395 Views

Without the code it is really hard to say something. What do you mean by "the simulation crashes"? Can it be that it runs out of memory?

 

0 Kudos
TimP
Honored Contributor III
1,395 Views

Among the possibilities are that your application violates OpenMP in some way which was not exposed by earlier compilers, or that there is actually a new bug.  If you can't figure it out, you could submit a case to online service center.

I mentioned already a case (using both reduction and lastprivate clauses in one directive) of which I'm uncertain.  This first failed with the new compiler.

0 Kudos
danielsue
Beginner
1,395 Views

Juergen R. wrote:

Without the code it is really hard to say something. What do you mean by "the simulation crashes"? Can it be that it runs out of memory?

 

The code is running, but it is much slower than sequential version and it does not converge. I think the code is actually not compiled as expected for the openmp related function. 

0 Kudos
danielsue
Beginner
1,395 Views

Tim P. wrote:

Among the possibilities are that your application violates OpenMP in some way which was not exposed by earlier compilers, or that there is actually a new bug.  If you can't figure it out, you could submit a case to online service center.

I mentioned already a case (using both reduction and lastprivate clauses in one directive) of which I'm uncertain.  This first failed with the new compiler.

Thanks, Tim. There is no directive that use reduction and lastprivate at the the same time. Acutally there is only one lastprivate in the code and this part is not used in my test case. I will make more test and then submit to service center for help if there is still problem.

 

0 Kudos
danielsue
Beginner
1,395 Views

Tim P. wrote:

Among the possibilities are that your application violates OpenMP in some way which was not exposed by earlier compilers, or that there is actually a new bug.  If you can't figure it out, you could submit a case to online service center.

I mentioned already a case (using both reduction and lastprivate clauses in one directive) of which I'm uncertain.  This first failed with the new compiler.

Today, I tested the latest 2019 version and the code can be compiled and run without problem in release mode, but still cause convergence problem in debug mode. The problem sounds like the OpenMP related code is not parsed correctly when I include some functions, even though these functions are not called when the code is running. I have reported this problem to service center with source code and example.

Thanks

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,395 Views

>> but still cause convergence problem in debug mode.

While this can be explained by a compiler bug, it can more often be explained by a poorly written convergence code. By this I mean writing convergence test using literal constants (e.g. an epsilon) as opposed to using a runtime determination of what the epsilon should be. This runtime determination can vary depending upon floating point optimization levels.

It is possible that for all these years, the convergence code work by accident as opposed to by design.

Usually, convergence issues, tend to be the obverse of what you are experiencing. IOW convergence work in Debug, but not in Release. Your experience is peculiar. The support center may be able to determine the underlying cause.

Jim Dempsey

0 Kudos
danielsue
Beginner
1,395 Views

jimdempseyatthecove wrote:

>> but still cause convergence problem in debug mode.

While this can be explained by a compiler bug, it can more often be explained by a poorly written convergence code. By this I mean writing convergence test using literal constants (e.g. an epsilon) as opposed to using a runtime determination of what the epsilon should be. This runtime determination can vary depending upon floating point optimization levels.

It is possible that for all these years, the convergence code work by accident as opposed to by design.

Usually, convergence issues, tend to be the obverse of what you are experiencing. IOW convergence work in Debug, but not in Release. Your experience is peculiar. The support center may be able to determine the underlying cause.

Jim Dempsey

Hi Jim,

I am afraid this convergence problem is caused by incorrect code parse. Take the following code section for example. 

=========Code section===========
#ifdef OPENMP
    !$omp do schedule(static, chunk)
#endif
      do ivol = 1,nngl                 !loop over control volumes         

#ifdef OPENMP
        tid = omp_get_thread_num() + 1
#else
        tid = 1
#endif

      ...
!c    code test to check if OpenMP schedule is correct, output to temporatory file 1000+tid
        write(1000+tid,*) "tid",tid,"ivol",ivol      

#ifdef USG
        if (discretization_type > 0) then
            grad_locs(ivol) = gradient_dd_green_gauss_tri(ivol)
        end if
#endif     

      end do

=========End of code Section===========

If the code is compiled with "USG" part, even though the simulation case does not use this part (when discretization_type == 0), the code still crashes because of race condition. The race condition is because the loop "do ivol = 1,nngl" does not run as expected when using Intel XE 2017/2018.
For example if nngl is 100 and chunk is 25, when run this part using 4 threads, the correct output for each threads from GFortran, XE2013 and XE2019 is
tid  0   ivol  1
tid  0   ivol  2
...
tid  0   ivol  25

tid  1   ivol  26
tid  1   ivol  27
...
tid  1   ivol  50

tid  2   ivol  51
tid  2   ivol  52
...
tid  2   ivol  75

tid  3   ivol  76
tid  3   ivol  77
...
tid  3   ivol  100

However, Intel XE 2017/2018 gives the following wrong results
tid  0   ivol  1
tid  0   ivol  2
...
tid  0   ivol  100

tid  1   ivol  1
tid  1   ivol  2
...
tid  1   ivol  100

tid  2   ivol  1
tid  2   ivol  2
...
tid  2   ivol  100

tid  3   ivol  1
tid  3   ivol  2
...
tid  3   ivol  100

Thanks

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,395 Views

In your parallel region you write to tid, which defaults to being a shared variable. You must identify the per-thread context variables.

!$omp do schedule(static, chunk) private(tid) ! Note, the do loop control variable is implicitly private.

If you have other private variables, add them to the private clause.

Jim Dempsey

0 Kudos
danielsue
Beginner
1,395 Views

jimdempseyatthecove wrote:

In your parallel region you write to tid, which defaults to being a shared variable. You must identify the per-thread context variables.

!$omp do schedule(static, chunk) private(tid) ! Note, the do loop control variable is implicitly private.

If you have other private variables, add them to the private clause.

Jim Dempsey

Sorry for confusing. I actually have added these variables to the private clause. Just forgot to copy these lines here. It not because of this. 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,395 Views

The behavior you gave for the second example is what you would see if the !$omp do was not processed.
BTW, presumably, prior to the sample code presented, you have an expanded !$omp parallel.

You should be aware that:

!$...

statements do not need the conditional compile #ifdef OPENMP enclosures. When compiled without OpenMP option, they are comments.

My guess is, for some reason the compiler is performing:

...
!$omp parallel ...
...
! *** not seen *** !$omp do...
...

Try something like:

=========Code section===========
!$omp parallel
...
     !$omp do schedule(static, chunk) private(tid)
       do ivol = 1,nngl                 !loop over control volumes         
         tid = 1                        !in event of non-OpenMP
!$      tid = omp_get_thread_num() + 1  !overstrike in event of OpenMP
...

Note, with any compiler optimization enabled, the tid = 1 will be removed.

Jim Dempsey

0 Kudos
danielsue
Beginner
1,395 Views

jimdempseyatthecove wrote:

The behavior you gave for the second example is what you would see if the !$omp do was not processed.
BTW, presumably, prior to the sample code presented, you have an expanded !$omp parallel.

You should be aware that:

!$...

statements do not need the conditional compile #ifdef OPENMP enclosures. When compiled without OpenMP option, they are comments.

My guess is, for some reason the compiler is performing:

...
!$omp parallel ...
...
! *** not seen *** !$omp do...
...

Try something like:

=========Code section===========
!$omp parallel
...
     !$omp do schedule(static, chunk) private(tid)
       do ivol = 1,nngl                 !loop over control volumes         
         tid = 1                        !in event of non-OpenMP
!$      tid = omp_get_thread_num() + 1  !overstrike in event of OpenMP
...

Note, with any compiler optimization enabled, the tid = 1 will be removed.

Jim Dempsey

Hi Jim,

Thanks any way. Unfortunately, this does not solve the problem I have. I will update once I get a solution from support center.

Danyang

0 Kudos
Reply