Among the possibilities are

danielsue · ‎10-04-2018

Dear All,

I have a parallelized fortran code which has unusual behavior when compiled using latest intel compiler, with OpenMP enabled. The code can be compiled, but when I run the code, it is extreme slow and the simulation crashes at the first timestep. Absolute something is wrong.

However, the same code has been seriously tested on different platform and I just wonder if I need to do some special configuration when using the latest intel fortran compiler.

Below is my test history:

Linux workstation-1 Intel 2014    Works

Linux workstation-2 GFortran 5   Works

Windows workstation-3 Intel 2013    Works

Linux cluster-4 GFortran 7   Works

Linux cluster-4 Intel2017/2018    Fails

The makefile looks like below. For Intel2017/2018, I changed -fopenmp to -qopenmp.

FC = ifort
FFLAGS = -fopenmp -O3

FPPFLAGS = -DLINUX -DRELEASE -DOPENMP

#Source code folder

SRC =./../

SOURCES = $(SRC)usg/math_common.o\

...

$(SRC)min3p/pitzer/closepitzer.o

executable: $(SOURCES)

$(FC) $(FFLAGS) $(FPPFLAGS) -o executable-name $(SOURCES)

This is the first time I switch to Intel 2018 compiler and I just wonder if I need special configuration.

Thanks,

Danyang

Juergen_R_R · ‎10-05-2018

Without the code it is really hard to say something. What do you mean by "the simulation crashes"? Can it be that it runs out of memory?

TimP · ‎10-05-2018

Among the possibilities are that your application violates OpenMP in some way which was not exposed by earlier compilers, or that there is actually a new bug. If you can't figure it out, you could submit a case to online service center.

I mentioned already a case (using both reduction and lastprivate clauses in one directive) of which I'm uncertain. This first failed with the new compiler.

danielsue · ‎10-05-2018

Juergen R. wrote:

Without the code it is really hard to say something. What do you mean by "the simulation crashes"? Can it be that it runs out of memory?

The code is running, but it is much slower than sequential version and it does not converge. I think the code is actually not compiled as expected for the openmp related function.

danielsue · ‎10-05-2018

Tim P. wrote:

Among the possibilities are that your application violates OpenMP in some way which was not exposed by earlier compilers, or that there is actually a new bug. If you can't figure it out, you could submit a case to online service center.

I mentioned already a case (using both reduction and lastprivate clauses in one directive) of which I'm uncertain. This first failed with the new compiler.

Thanks, Tim. There is no directive that use reduction and lastprivate at the the same time. Acutally there is only one lastprivate in the code and this part is not used in my test case. I will make more test and then submit to service center for help if there is still problem.

danielsue · ‎10-10-2018

Tim P. wrote:

Among the possibilities are that your application violates OpenMP in some way which was not exposed by earlier compilers, or that there is actually a new bug. If you can't figure it out, you could submit a case to online service center.

I mentioned already a case (using both reduction and lastprivate clauses in one directive) of which I'm uncertain. This first failed with the new compiler.

Today, I tested the latest 2019 version and the code can be compiled and run without problem in release mode, but still cause convergence problem in debug mode. The problem sounds like the OpenMP related code is not parsed correctly when I include some functions, even though these functions are not called when the code is running. I have reported this problem to service center with source code and example.

Thanks

jimdempseyatthecove · ‎10-11-2018

>> but still cause convergence problem in debug mode.

While this can be explained by a compiler bug, it can more often be explained by a poorly written convergence code. By this I mean writing convergence test using literal constants (e.g. an epsilon) as opposed to using a runtime determination of what the epsilon should be. This runtime determination can vary depending upon floating point optimization levels.

It is possible that for all these years, the convergence code work by accident as opposed to by design.

Usually, convergence issues, tend to be the obverse of what you are experiencing. IOW convergence work in Debug, but not in Release. Your experience is peculiar. The support center may be able to determine the underlying cause.

Jim Dempsey

danielsue · ‎10-11-2018

jimdempseyatthecove wrote:

>> but still cause convergence problem in debug mode.

While this can be explained by a compiler bug, it can more often be explained by a poorly written convergence code. By this I mean writing convergence test using literal constants (e.g. an epsilon) as opposed to using a runtime determination of what the epsilon should be. This runtime determination can vary depending upon floating point optimization levels.

It is possible that for all these years, the convergence code work by accident as opposed to by design.

Usually, convergence issues, tend to be the obverse of what you are experiencing. IOW convergence work in Debug, but not in Release. Your experience is peculiar. The support center may be able to determine the underlying cause.

Jim Dempsey

Hi Jim,

I am afraid this convergence problem is caused by incorrect code parse. Take the following code section for example.

=========Code section===========
#ifdef OPENMP
!$omp do schedule(static, chunk)
#endif
do ivol = 1,nngl !loop over control volumes

#ifdef OPENMP
tid = omp_get_thread_num() + 1
#else
tid = 1
#endif

...
!c code test to check if OpenMP schedule is correct, output to temporatory file 1000+tid
write(1000+tid,*) "tid",tid,"ivol",ivol

#ifdef USG
  if (discretization_type > 0) then
    grad_locs(ivol) = gradient_dd_green_gauss_tri(ivol)
  end if
#endif

end do

=========End of code Section===========

If the code is compiled with "USG" part, even though the simulation case does not use this part (when discretization_type == 0), the code still crashes because of race condition. The race condition is because the loop "do ivol = 1,nngl" does not run as expected when using Intel XE 2017/2018.
For example if nngl is 100 and chunk is 25, when run this part using 4 threads, the correct output for each threads from GFortran, XE2013 and XE2019 is
tid 0 ivol 1
tid 0 ivol 2
...
tid 0 ivol 25

tid 1 ivol 26
tid 1 ivol 27
...
tid 1 ivol 50

tid 2 ivol 51
tid 2 ivol 52
...
tid 2 ivol 75

tid 3 ivol 76
tid 3 ivol 77
...
tid 3 ivol 100

However, Intel XE 2017/2018 gives the following wrong results
tid 0 ivol 1
tid 0 ivol 2
...
tid 0 ivol 100

tid 1 ivol 1
tid 1 ivol 2
...
tid 1 ivol 100

tid 2 ivol 1
tid 2 ivol 2
...
tid 2 ivol 100

tid 3 ivol 1
tid 3 ivol 2
...
tid 3 ivol 100

Thanks

jimdempseyatthecove · ‎10-11-2018

In your parallel region you write to tid, which defaults to being a shared variable. You must identify the per-thread context variables.

!$omp do schedule(static, chunk) private(tid) ! Note, the do loop control variable is implicitly private.

If you have other private variables, add them to the private clause.

Jim Dempsey

danielsue · ‎10-11-2018

jimdempseyatthecove wrote:

In your parallel region you write to tid, which defaults to being a shared variable. You must identify the per-thread context variables.

!$omp do schedule(static, chunk) private(tid) ! Note, the do loop control variable is implicitly private.

If you have other private variables, add them to the private clause.

Jim Dempsey

Sorry for confusing. I actually have added these variables to the private clause. Just forgot to copy these lines here. It not because of this.

jimdempseyatthecove · ‎10-12-2018

The behavior you gave for the second example is what you would see if the !$omp do was not processed.
BTW, presumably, prior to the sample code presented, you have an expanded !$omp parallel.

You should be aware that:

!$...

statements do not need the conditional compile #ifdef OPENMP enclosures. When compiled without OpenMP option, they are comments.

My guess is, for some reason the compiler is performing:

...
!$omp parallel ...
...
! *** not seen *** !$omp do...
...

Try something like:

=========Code section===========
!$omp parallel
...
     !$omp do schedule(static, chunk) private(tid)
       do ivol = 1,nngl                 !loop over control volumes         
         tid = 1                        !in event of non-OpenMP
!$      tid = omp_get_thread_num() + 1  !overstrike in event of OpenMP
...

Note, with any compiler optimization enabled, the tid = 1 will be removed.

Jim Dempsey

danielsue · ‎10-16-2018

jimdempseyatthecove wrote:

The behavior you gave for the second example is what you would see if the !$omp do was not processed.
BTW, presumably, prior to the sample code presented, you have an expanded !$omp parallel.

You should be aware that:

!$...

statements do not need the conditional compile #ifdef OPENMP enclosures. When compiled without OpenMP option, they are comments.

My guess is, for some reason the compiler is performing:
...
!$omp parallel ...
...
! *** not seen *** !$omp do...
...
Try something like:
=========Code section===========
!$omp parallel
...
     !$omp do schedule(static, chunk) private(tid)
       do ivol = 1,nngl                 !loop over control volumes         
         tid = 1                        !in event of non-OpenMP
!$      tid = omp_get_thread_num() + 1  !overstrike in event of OpenMP
...
Note, with any compiler optimization enabled, the tid = 1 will be removed.

Jim Dempsey

Hi Jim,

Thanks any way. Unfortunately, this does not solve the problem I have. I will update once I get a solution from support center.

Danyang

Unusual OpenMP behavior on Intel 2017/2018