- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a parallel loop like this one:
!$OMP PARALLEL DO NUM_THREADS(2) DEFAULT(SHARED)
DO J=1, X
!$OMP CRITICAL
aux+=f(...some parameters...)
!$OMP END CRITICAL
END DO
!$OMP END PARALLEL DO
In f(), there are some "double precission" and "integer" definitions and the parameters variables and it calculates a number.
The thing, is that "aux" depends on which threads is computing the f() function. This is, if thread0 computes interations 1,2,3,4 and thread1 computes interations with J=5,6,7,8, aux is always the same. But when there is a different combination of iterations in threads, the results differ.
So, to achieve the same "aux" result between two executions of the program, thread0 has to compute exactly the same iterations. What could produce this behaviour? ( for example, threadprivate declarations)
Thanks in advance
Link Copied
8 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Variations in roundoff behavior are inherent in reduction operations. In the context you posted, it looks like you need firstprivate(aux) lastprivate(aux) in order to accomplish it by critical section.
Assuming your function is correctly parallelized and has no side effects (and is compiled with compatible options or RECURSIVE declaration), numerical variations are to be expected due to varying order of addition.
The OpenMP reduction clause might be more efficient and might have better numerical properties, as well as being simpler to write, than the critical section. reduction clause would avoid the need for firstprivate lastprivate.
If the code is correctly parallelized but still has excessive variations in roundoff due to various orders of addition, the simplest remedy would be to declare aux as double precision if it isn't already.
Assuming your function is correctly parallelized and has no side effects (and is compiled with compatible options or RECURSIVE declaration), numerical variations are to be expected due to varying order of addition.
The OpenMP reduction clause might be more efficient and might have better numerical properties, as well as being simpler to write, than the critical section. reduction clause would avoid the need for firstprivate lastprivate.
If the code is correctly parallelized but still has excessive variations in roundoff due to various orders of addition, the simplest remedy would be to declare aux as double precision if it isn't already.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This is a simplification of the problem. In fact, in my problem, "aux" would be a matrix and each position in the matrix is accesed only once in the whole loop, so no reduction here, only a shared matrix.
I've tested also the code in sequential with random order of iterations and works fine
PS: The critical is only for showing that there is not a concurrence problem. I think the problem is in function f() and what variables it declares in stack (maybe garbage values that persist between iterations or something).
Any ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did suggest that you check the function f() to assure that it was compiled with options to assure private stack. You may recall that Steve Lionel suggested that such functions should be declared RECURSIVE so as to avoid those dependencies on compile options. When that is done, the order of threaded completion should not have any more effects than changes in order of sequential iterations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider using the ORDERED directive
!$OMP PARALLEL DO ORDERED NUM_THREADS(2) DEFAULT(SHARED)
!$OMP PARALLEL DO ORDERED NUM_THREADS(2) DEFAULT(SHARED)
DO J=1, X
... ! possibly other work here
!$OMP ORDERED
aux+=f(...some parameters...)
!$OMP ORDERED
aux+=f(...some parameters...)
!$OMP END ORDERED
... ! possibly other work here
... ! possibly other work here
END DO
!$OMP END PARALLEL DO
Jim Dempsey
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I've read about this inhttp://software.intel.com/en-us/forums/showpost.php?p=117749
I thought that /recursive was implicit with /Qopenmp. Anyway, compiling with /recursive doesnt solve the problem :(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which compilation flags should I use in order to assure that all variables static/stack/heap are 0-initialized?
I have not SAVE statments or commons
Thanks in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Whilst it is not a good idea to rely on such settings(it is always better to change the code even if it is a long slog to do so (or you could write a script to help you)) /QSave and /QZeroare the ones you want.
Note/QZero only initialises saved scalar variables - arrays you will have to do yourself.
see the help for more details
Les
Note/QZero only initialises saved scalar variables - arrays you will have to do yourself.
see the help for more details
Les
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The combination /Qsave /Qzero works by removing the affected variables from stack, so it does nothing to initialize stack. It won't help you with parallelization.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page