Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Qopenmp when not using OpenMP

cureshot
Beginner
3,023 Views

Hi,

 

I am having the following discrepancy. I run the following three experimentations: (i) I run my code with Qopenmp compiler switch (IFORT) but with no OPENMP directives enabled, (ii) I run without Qopenmp, (iii) I run with Qopenmp with OPENMP directives enabled using 96 threads.

 

My first and third results are identical but significantly differs from the second one. I have been trying to figure out the source of this problem but so far no luck. 

 

Does anyone know what is happening behind the scenes and can lead me? Which one would be more reliable?

 

best, kursat

0 Kudos
18 Replies
andrew_4619
Honored Contributor III
2,999 Views

What do you mean by "results" I think it may be helpful to add to your question to give more detail.

0 Kudos
cureshot
Beginner
2,997 Views

I am solving a dynamic optimization problem with multiple continuous choice variables. The maximizers of a function substantially change depending on whether I have Qopenmp compiler switch or not. Please let me know if this clarifies "results"

0 Kudos
andrew_4619
Honored Contributor III
2,992 Views

So you are talking about numeric differences in "results" rather than some other attribute like say run time so yes that does help. 

0 Kudos
cureshot
Beginner
2,985 Views

Yes, any leads?

0 Kudos
Steve_Lionel
Honored Contributor III
2,958 Views

The difference I can think of is that /Qopenmp implies /recursive, using the stack for things that would otherwise be in static storage. If you have references to uninitialized memory that change would be a prime candidate for triggering bad results.

(Note: While Fortran 2018 makes all procedures recursive by default, the Intel compiler currently doesn't do that unless you specify /standard-semantics.)

0 Kudos
cureshot
Beginner
2,896 Views

Dear Steve, Thank you for your response. I am still working on the discrepancy. If I had to stick on to one, which one would be the most reliable one? The one with /Qopenmp is switched on or off?

 

best

0 Kudos
TobiasK
Moderator
2,886 Views

@cureshot 

1) How do you disable directives?
2) How does you compile / link line look like?
3) Do you have external dependencies? MKL uses TBB or OpenMP depending on the presence of qopenmp.
4) Do you use openmp simd somewhere in the code? -qopenmp enables -qopenmp-simd (In IFX -qopenmp-simd is enabled at -O1 even without -qopenmp...)
5) Just to be sure, run with your (i) with OMP_DISPLAY_ENV=verbose or true
6) Not relevant for IFORT but for IFX: 
In IFORT if a parallel region with an if statement is encountered and the if statement evaluates to false, the parallel region the data sharing was completley ignored. This was not compliant with the standard. In IFX the data sharing clauses are still honored, e.g. the if clause is just another way to set num_threads(1) for that region.

E.g.:

i=1
!$omp parallel private(i) if(.false.)
write(*,*) i
!$omp end parallel

 will return for IFORT 1 and for IFX i is not initialized since it is declared private.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,875 Views

Tobias, I think your example does not illustrate the point you are trying to make.

Consider:

i=1
!$omp parallel private(i) if(.false.)
write(*,*) i
!$omp end parallel
write(*,*) Array(i)

According to your text for 6), i would be undefined. I don't think that is what you wanted to illustrate.

Perhaps if i were declared and then only defined on your line 1, used only in the !$omp region, and not used outside of that parallel region (and optimizations enabled), that then the "i=1" would be elided by the optimization. So maybe you meant to say this in your description for point 6.

 

Jim Dempsey

cureshot
Beginner
2,860 Views

1) How do you disable directives?

I simply comment all out.
2) How does you compile / link line look like?

ifort XXX_v5.f90 -O2 -o XX.out -qopenmp
3) Do you have external dependencies? MKL uses TBB or OpenMP depending on the presence of qopenmp.

No, they are not turned on.
4) Do you use openmp simd somewhere in the code? -qopenmp enables -qopenmp-simd (In IFX -qopenmp-simd is enabled at -O1 even without -qopenmp...)

No.
5) Just to be sure, run with your (i) with OMP_DISPLAY_ENV=verbose or true

Thanks for that.

 

One small update. I have noticed that I initiated a variable in a routine as Double Precision and when passing it to a function redefined it as real variable. When I fixed that, discrepancies got smaller. (This is still weird to me as to why -qopenmp / -qno-openmp makes a difference.)

0 Kudos
TobiasK
Moderator
2,872 Views

Jim, this is exactly what I intended to say, i is uninitialized inside the parallel region. Outside the parallel region it is initialized to 1, no matter what happens inside the parallel region.
Now IFORT uses a non-standard behavior and simply generates code as if !$omp parallel was completely absent, while IFX follows the standard where if(.false.) is just setting num_threads(1) and the privatization of i still happens.

 

 

program test_parallel_if
  implicit none
  integer :: i

  i = 1
  write(*,*) 'i before parallel region',i
  !$omp parallel private(i) if(.false.)                                                                                                                                                                                                                                                                                       
  write(*,*) 'i inside parallel 1 region',i
  i = 2
  write(*,*) 'i inside parallel 2 region',i
  !$omp end parallel                                                                                                                                                                                                                                                                                                          
  write(*,*) 'i outside parallel 2 region',i

end program test_parallel_if

 

 

 

 

 

tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ifx -O0 -qopenmp parallel_if_1.f90 
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ./a.out 
 i before parallel region           1
 i inside parallel 1 region           0
 i inside parallel 2 region           2
 i outside parallel 2 region           1
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ifort -O0 -qopenmp parallel_if_1.f90 
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ./a.out 
 i before parallel region           1
 i inside parallel 1 region           1
 i inside parallel 2 region           2
 i outside parallel 2 region           2

tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ifort -O0 parallel_if_1.f90 
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ./a.out 
 i before parallel region           1
 i inside parallel 1 region           1
 i inside parallel 2 region           2
 i outside parallel 2 region           2
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ifx -O0 parallel_if_1.f90 
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ./a.out 
 i before parallel region           1
 i inside parallel 1 region           1
 i inside parallel 2 region           2
 i outside parallel 2 region           2

 

 

Edit:

Ok, since @cureshot sees a difference between -qopenmp / -qno-openmp he should watch out for variables that are privatized and reused after parallel regions. With qopenmp the variable has still the same value as before the parallel region with qno-openmp the value is changed.

The additional difference in treating the if clause is maybe not as relevant, but still something one should look for, e.g. one might have coded
!$omp parallel if(omp_get_max_threads().gt.1)

Which will make -qopenmp/-qno-openmp equivalaent for IFORT and with IFX you still since the difference between qopenmp/noopenmp for OMP_NUM_THREADS=1

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,856 Views

I think it might have been better to describe the nuance as:

 

For ifort, Private variables of the master thread are located at the same address (or simply are) as outside the scope of the parallel region and for other threads of the team they have their own unique address (on stack or allocated on heap).

Whereas for ifx, private variables, including for the master thread, have their own unique address (on stack or allocated on heap).

 

I can see where this difference is like walking on thin ice. Depending on unforeseen circumstances you either are supported or fall through the ice.

 

RE: !$omp parallel if(omp_get_max_threads().gt.1)

 

I do not think it will make a difference in behavior as main thread in ifx build will have separate location for variable i.

Though this should be verified with a test case.

 

Jim Dempsey

0 Kudos
andrew_4619
Honored Contributor III
2,844 Views

I suggest comparing -recursive and -norecursive runs WITHOUT -qopenmp

0 Kudos
Steve_Lionel
Honored Contributor III
2,830 Views

I mentioned before that the results differing in the two cases strongly suggests an error in your code.

0 Kudos
cureshot
Beginner
2,785 Views

Hi all,

thank you so much for your support. I believe I have found the source of discrepancies. One of them is that when I initiated a variable in a routine as Double Precision and when passing it to a function I redefined it as real variable. This apparently makes a difference. The second one is related to calling a one particular IMSL routine. I am not sure what's going on there. When I replaced it with alternative IMSL routine, discrepancies disappear.

 

best

0 Kudos
Barbara_P_Intel
Employee
2,776 Views

Thanks for the update! Great that you found an coding error. Those kind are tricky to find.

 

0 Kudos
cureshot
Beginner
2,746 Views

One of my question still remains unanswered. Which run is more reliable, a run with -qopenmp but no OMP directives turned on or without -qopenmp? Running it under Debug mode with VS is a good starting point but not always possible with IFORT on the server. So I am curious what more advanced people would suggest on that.

 

 

0 Kudos
cureshot
Beginner
2,741 Views

Also, does machine precision change with or without -qopenmp? It seems so but I wanted to confirm and I wanted to find out why?

 

 

0 Kudos
Steve_Lionel
Honored Contributor III
2,724 Views

Using -qopenmp has no effet on machine precision, and if you aren't using any OpenMP directives won't affect the code other than making all procedures recursive-capable.

0 Kudos
Reply