Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28457 Discussions

Different behaviour depending on which thread computes which interation loop (OpenMP parallel for)

aurora
Beginner
296 Views
Hi,
I have a parallel loop like this one:
!$OMP PARALLEL DO NUM_THREADS(2) DEFAULT(SHARED)
DO J=1, X
!$OMP CRITICAL
aux+=f(...some parameters...)
!$OMP END CRITICAL
END DO
!$OMP END PARALLEL DO
In f(), there are some "double precission" and "integer" definitions and the parameters variables and it calculates a number.
The thing, is that "aux" depends on which threads is computing the f() function. This is, if thread0 computes interations 1,2,3,4 and thread1 computes interations with J=5,6,7,8, aux is always the same. But when there is a different combination of iterations in threads, the results differ.
So, to achieve the same "aux" result between two executions of the program, thread0 has to compute exactly the same iterations. What could produce this behaviour? ( for example, threadprivate declarations)
Thanks in advance
0 Kudos
8 Replies
TimP
Honored Contributor III
296 Views
Variations in roundoff behavior are inherent in reduction operations. In the context you posted, it looks like you need firstprivate(aux) lastprivate(aux) in order to accomplish it by critical section.
Assuming your function is correctly parallelized and has no side effects (and is compiled with compatible options or RECURSIVE declaration), numerical variations are to be expected due to varying order of addition.
The OpenMP reduction clause might be more efficient and might have better numerical properties, as well as being simpler to write, than the critical section. reduction clause would avoid the need for firstprivate lastprivate.
If the code is correctly parallelized but still has excessive variations in roundoff due to various orders of addition, the simplest remedy would be to declare aux as double precision if it isn't already.
0 Kudos
aurora
Beginner
296 Views
Hi,
This is a simplification of the problem. In fact, in my problem, "aux" would be a matrix and each position in the matrix is accesed only once in the whole loop, so no reduction here, only a shared matrix.
I've tested also the code in sequential with random order of iterations and works fine
PS: The critical is only for showing that there is not a concurrence problem. I think the problem is in function f() and what variables it declares in stack (maybe garbage values that persist between iterations or something).
Any ideas?
0 Kudos
TimP
Honored Contributor III
296 Views
I did suggest that you check the function f() to assure that it was compiled with options to assure private stack. You may recall that Steve Lionel suggested that such functions should be declared RECURSIVE so as to avoid those dependencies on compile options. When that is done, the order of threaded completion should not have any more effects than changes in order of sequential iterations.
0 Kudos
jimdempseyatthecove
Honored Contributor III
296 Views
Consider using the ORDERED directive

!$OMP PARALLEL DO ORDERED NUM_THREADS(2) DEFAULT(SHARED)
DO J=1, X
... ! possibly other work here
!$OMP ORDERED
aux+=f(...some parameters...)
!$OMP END ORDERED
... ! possibly other work here
END DO
!$OMP END PARALLEL DO

Jim Dempsey
0 Kudos
aurora
Beginner
296 Views
Hi,
I thought that /recursive was implicit with /Qopenmp. Anyway, compiling with /recursive doesnt solve the problem :(
0 Kudos
aurora
Beginner
296 Views
Which compilation flags should I use in order to assure that all variables static/stack/heap are 0-initialized?
I have not SAVE statments or commons
Thanks in advance!
0 Kudos
Les_Neilson
Valued Contributor II
296 Views
Whilst it is not a good idea to rely on such settings(it is always better to change the code even if it is a long slog to do so (or you could write a script to help you)) /QSave and /QZeroare the ones you want.

Note/QZero only initialises saved scalar variables - arrays you will have to do yourself.

see the help for more details

Les
0 Kudos
TimP
Honored Contributor III
296 Views
The combination /Qsave /Qzero works by removing the affected variables from stack, so it does nothing to initialize stack. It won't help you with parallelization.
0 Kudos
Reply