- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am having the following discrepancy. I run the following three experimentations: (i) I run my code with Qopenmp compiler switch (IFORT) but with no OPENMP directives enabled, (ii) I run without Qopenmp, (iii) I run with Qopenmp with OPENMP directives enabled using 96 threads.
My first and third results are identical but significantly differs from the second one. I have been trying to figure out the source of this problem but so far no luck.
Does anyone know what is happening behind the scenes and can lead me? Which one would be more reliable?
best, kursat
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do you mean by "results" I think it may be helpful to add to your question to give more detail.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am solving a dynamic optimization problem with multiple continuous choice variables. The maximizers of a function substantially change depending on whether I have Qopenmp compiler switch or not. Please let me know if this clarifies "results"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So you are talking about numeric differences in "results" rather than some other attribute like say run time so yes that does help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, any leads?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The difference I can think of is that /Qopenmp implies /recursive, using the stack for things that would otherwise be in static storage. If you have references to uninitialized memory that change would be a prime candidate for triggering bad results.
(Note: While Fortran 2018 makes all procedures recursive by default, the Intel compiler currently doesn't do that unless you specify /standard-semantics.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Steve, Thank you for your response. I am still working on the discrepancy. If I had to stick on to one, which one would be the most reliable one? The one with /Qopenmp is switched on or off?
best
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@cureshot
1) How do you disable directives?
2) How does you compile / link line look like?
3) Do you have external dependencies? MKL uses TBB or OpenMP depending on the presence of qopenmp.
4) Do you use openmp simd somewhere in the code? -qopenmp enables -qopenmp-simd (In IFX -qopenmp-simd is enabled at -O1 even without -qopenmp...)
5) Just to be sure, run with your (i) with OMP_DISPLAY_ENV=verbose or true
6) Not relevant for IFORT but for IFX:
In IFORT if a parallel region with an if statement is encountered and the if statement evaluates to false, the parallel region the data sharing was completley ignored. This was not compliant with the standard. In IFX the data sharing clauses are still honored, e.g. the if clause is just another way to set num_threads(1) for that region.
E.g.:
i=1
!$omp parallel private(i) if(.false.)
write(*,*) i
!$omp end parallel
will return for IFORT 1 and for IFX i is not initialized since it is declared private.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tobias, I think your example does not illustrate the point you are trying to make.
Consider:
i=1
!$omp parallel private(i) if(.false.)
write(*,*) i
!$omp end parallel
write(*,*) Array(i)
According to your text for 6), i would be undefined. I don't think that is what you wanted to illustrate.
Perhaps if i were declared and then only defined on your line 1, used only in the !$omp region, and not used outside of that parallel region (and optimizations enabled), that then the "i=1" would be elided by the optimization. So maybe you meant to say this in your description for point 6.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) How do you disable directives?
I simply comment all out.
2) How does you compile / link line look like?
ifort XXX_v5.f90 -O2 -o XX.out -qopenmp
3) Do you have external dependencies? MKL uses TBB or OpenMP depending on the presence of qopenmp.
No, they are not turned on.
4) Do you use openmp simd somewhere in the code? -qopenmp enables -qopenmp-simd (In IFX -qopenmp-simd is enabled at -O1 even without -qopenmp...)
No.
5) Just to be sure, run with your (i) with OMP_DISPLAY_ENV=verbose or true
Thanks for that.
One small update. I have noticed that I initiated a variable in a routine as Double Precision and when passing it to a function redefined it as real variable. When I fixed that, discrepancies got smaller. (This is still weird to me as to why -qopenmp / -qno-openmp makes a difference.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, this is exactly what I intended to say, i is uninitialized inside the parallel region. Outside the parallel region it is initialized to 1, no matter what happens inside the parallel region.
Now IFORT uses a non-standard behavior and simply generates code as if !$omp parallel was completely absent, while IFX follows the standard where if(.false.) is just setting num_threads(1) and the privatization of i still happens.
program test_parallel_if
implicit none
integer :: i
i = 1
write(*,*) 'i before parallel region',i
!$omp parallel private(i) if(.false.)
write(*,*) 'i inside parallel 1 region',i
i = 2
write(*,*) 'i inside parallel 2 region',i
!$omp end parallel
write(*,*) 'i outside parallel 2 region',i
end program test_parallel_if
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ifx -O0 -qopenmp parallel_if_1.f90
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ./a.out
i before parallel region 1
i inside parallel 1 region 0
i inside parallel 2 region 2
i outside parallel 2 region 1
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ifort -O0 -qopenmp parallel_if_1.f90
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ./a.out
i before parallel region 1
i inside parallel 1 region 1
i inside parallel 2 region 2
i outside parallel 2 region 2
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ifort -O0 parallel_if_1.f90
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '-diag-disable=10448' to disable this message.
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ./a.out
i before parallel region 1
i inside parallel 1 region 1
i inside parallel 2 region 2
i outside parallel 2 region 2
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ifx -O0 parallel_if_1.f90
tkloeffe@ortce-skl22:~/TCE/JIRA/own/OpenMP/parallel_if$ ./a.out
i before parallel region 1
i inside parallel 1 region 1
i inside parallel 2 region 2
i outside parallel 2 region 2
Edit:
Ok, since @cureshot sees a difference between -qopenmp / -qno-openmp he should watch out for variables that are privatized and reused after parallel regions. With qopenmp the variable has still the same value as before the parallel region with qno-openmp the value is changed.
The additional difference in treating the if clause is maybe not as relevant, but still something one should look for, e.g. one might have coded
!$omp parallel if(omp_get_max_threads().gt.1)
Which will make -qopenmp/-qno-openmp equivalaent for IFORT and with IFX you still since the difference between qopenmp/noopenmp for OMP_NUM_THREADS=1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think it might have been better to describe the nuance as:
For ifort, Private variables of the master thread are located at the same address (or simply are) as outside the scope of the parallel region and for other threads of the team they have their own unique address (on stack or allocated on heap).
Whereas for ifx, private variables, including for the master thread, have their own unique address (on stack or allocated on heap).
I can see where this difference is like walking on thin ice. Depending on unforeseen circumstances you either are supported or fall through the ice.
RE: !$omp parallel if(omp_get_max_threads().gt.1)
I do not think it will make a difference in behavior as main thread in ifx build will have separate location for variable i.
Though this should be verified with a test case.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suggest comparing -recursive and -norecursive runs WITHOUT -qopenmp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I mentioned before that the results differing in the two cases strongly suggests an error in your code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
thank you so much for your support. I believe I have found the source of discrepancies. One of them is that when I initiated a variable in a routine as Double Precision and when passing it to a function I redefined it as real variable. This apparently makes a difference. The second one is related to calling a one particular IMSL routine. I am not sure what's going on there. When I replaced it with alternative IMSL routine, discrepancies disappear.
best
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the update! Great that you found an coding error. Those kind are tricky to find.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One of my question still remains unanswered. Which run is more reliable, a run with -qopenmp but no OMP directives turned on or without -qopenmp? Running it under Debug mode with VS is a good starting point but not always possible with IFORT on the server. So I am curious what more advanced people would suggest on that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, does machine precision change with or without -qopenmp? It seems so but I wanted to confirm and I wanted to find out why?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using -qopenmp has no effet on machine precision, and if you aren't using any OpenMP directives won't affect the code other than making all procedures recursive-capable.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page