- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
C
C --- Matrix - Vector multiplication
C Sparse storage(CSR)
C
implicit real*8(a-h,o-z)
.............
!$OMP Parallel
!$OMP Do
do 100 i = 1, n
temp = 0.0d0
do 50 j = ix(i), ix(i+1) - 1
temp = temp + v(i) + ir(x(i))
50continue
y(i) = temp
100 continue
When run with OpenMP parallel, the resultis different from sequential code.
Testing system is
Window XP/Professional 64 bit, Intel fortran compiler 10.x, MicroSoft Visual Studio 2005, Intel Xeon 3.8GHz
Compiler Option
Fortran -> Optimization -> Paralleization -> No
-> Preprocessor -> OpenMP Conditional Compilation -> Yes
-> Process OpenMP Directives -> Generate Parallel Code
-> Data -> Default Double Precision KIND -> 8
Others -> Default
I want to get the answer; How to change the Compiler Options to get the same value.
I read the same question, but I don't understand the answer. ( i'm novice )
thanks in advance
Bon, Koo
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bon, Koo
The variable temp is being used as read/modify/write by multiple threads.
Same with the variable j.
Make temp and jprivate to each thread.
Note, variable i is implicitly private due to it being the indexing variable of a OMP DO
Use
!$OMP Parallel private(temp, j)
!$OMP Do
do 100 i = 1, n
temp = 0.0d0
do 50 j = ix(i), ix(i+1) - 1
temp = temp + v(i) + ir(x(i))
50continue
y(i) = temp
100 continue
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The variable temp is being used as read/modify/write by multiple threads.
Same with the variable j.
Make temp and jprivate to each thread.
Note, variable i is implicitly private due to it being the indexing variable of a OMP DO
Use
!$OMP Parallel private(temp, j)
!$OMP Do
do 100 i = 1, n
temp = 0.0d0
do 50 j = ix(i), ix(i+1) - 1
temp = temp + v(i) + ir(x(i))
50continue
y(i) = temp
100 continue
It is extremely important to specify temp as a private so as to permit optimization as well as avoiding unpredictable results.
If the work isn't balanced among threads, schedule(guided) may be useful in such situations, after the correctness issues are dealt with.
Roundoff error may be controlled, and performance improved, if the order of addition is specified:
temp = temp + (v(i) + ir(x(i)))
ifort would require the option /assume:protect_parens to require following the Fortran standard here.
There could still be variations in roundoff error, for example between SSE2 and SSE4 code, if the SSE4 code optimized by increasing the number of individual sums in the inner loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure where this bit of wisdom is coming from, but it's not correct. The OpenMP 3.0 standard describes the syntax of the Loop construct for C/C++ and defines veryspecific forms allowed for the associated for-loop. Part of that specification describes the loop inference variable thusly:
var One of the following
A variable of a signed or unsigned integer type.
For C++, a variable of a random access iterator type.
For C, a variable of a pointer type.
If this variable would otherwise be shared, it is implicitly made private in the loop construct....
Thus the OpenMP behavior for the loop-construct inference variableis the same, whether Fortran or C/C++.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I read Robert's comment correctly, he disagrees with both Jim and me about the C defaults, but agrees with me about the Fortran defaults.
The textbook "Using OpenMP," by Chapman, Jost, and van der Paas, written less than a year ago, states that it is based on OpenMP 2.5. It appears to recommend not relying on the default private for the parallel loop iteration parameter, but doesn't explain to what extent default(none) may influence the situation. It mentions 2.5 features briefly, without mentioning the lack of implementation in compilers for x86. It further inclines me to second Jim's recommendation, as the distinction I ponted out between Fortran and C is not well explained.
I haven't seen any discussion of why the inner loop parameters were treated differently in Fortran and C, but one evident difference is the definition of Fortran, where the loop count is calculated before the loop starts, while C or C++ compilers typically look for tinkering with the loop count inside the loop before proceeding with optimization.
C99 has some convenient features for OpenMP, but those still aren't available in all C compilers, and they don't change the situation about permitting dynamic alteration of loop count, so I don't see why OpenMP would have changed the relative treatment of Fortran and C, without publicizing any such change.
It would also be interesting to know which level of OpenMP is implemented in Intel compilers, since there seems to be priority on 3.0 features, while adopting a wait-and-see attitude about 2.5. gfortran 4.5 is trying to implement OpenMP 2.5, so maybe the wait-and-seers will have something to see.
The marketing guys think Intel compilers should gain a competitive advantage by implementing 3.0 features without 2.5, but it seems reasonable to expect most people to try to stay with 2.0 until 2.5 is generally implemented. If 3.0 has made a change in the defaults, intentionally or not, that introduces a potential additonal level of confusion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I read Robert's comment correctly, he disagrees with both Jim and me about the C defaults, but agrees with me about the Fortran defaults.
The textbook "Using OpenMP," by Chapman, Jost, and van der Paas, written less than a year ago, states that it is based on OpenMP 2.5. It appears to recommend not relying on the default private for the parallel loop iteration parameter, but doesn't explain to what extent default(none) may influence the situation. It mentions 2.5 features briefly, without mentioning the lack of implementation in compilers for x86. It further inclines me to second Jim's recommendation, as the distinction I ponted out between Fortran and C is not well explained.
I haven't seen any discussion of why the inner loop parameters were treated differently in Fortran and C, but one evident difference is the definition of Fortran, where the loop count is calculated before the loop starts, while C or C++ compilers typically look for tinkering with the loop count inside the loop before proceeding with optimization.
C99 has some convenient features for OpenMP, but those still aren't available in all C compilers, and they don't change the situation about permitting dynamic alteration of loop count, so I don't see why OpenMP would have changed the relative treatment of Fortran and C, without publicizing any such change.
It would also be interesting to know which level of OpenMP is implemented in Intel compilers, since there seems to be priority on 3.0 features, while adopting a wait-and-see attitude about 2.5. gfortran 4.5 is trying to implement OpenMP 2.5, so maybe the wait-and-seers will have something to see.
The marketing guys think Intel compilers should gain a competitive advantage by implementing 3.0 features without 2.5, but it seems reasonable to expect most people to try to stay with 2.0 until 2.5 is generally implemented. If 3.0 has made a change in the defaults, intentionally or not, that introduces a potential additonal level of confusion.
Thank you everyone!!! There's some error in question. Please see the code
C
C --- Matrix - Vector multiplication
C Sparse storage(CSR)
C
implicit real*8(a-h,o-z)
real*8 ir(*)
integer x(*)
.............
C--- OpenMP
!$OMP Parallel private(temp, is, ie ) !!!! variable 'j' -> private as default in fortran
!$OMP Do
do 100 i = 1, n
temp = 0.0d0
is = ix(i)
ie = ix(i+1) - 1
do 50 j = is, ie
temp = temp + v(i)* ir(x(i)) !!!! Error in question
50 continue
y(i) = temp
100 continue
!$OMP End parallel
C --- Serial Code
do 200 i = 1, n
temp = 0.0d0
is = ix(i)
ie = ix(i+1) - 1
do 150 j = is, ie
temp = temp + v(i) * ir(x(i))
150 continue
y1(i) = temp
200 continue
C
s = 0.0d0
t = 0.0d0
do 300 i = 1, n
s = s + y(i)
t = t + y1(i)
300 continue
c
print *,' s - t ',s-t
After execution, 's-t' have a order of value about 1.0d-13 ~ 1.0d-21
and, sometimes s-t equals to zero.
Change of options,
Optimization -> Use intel Processor Extensions -> Intel Pentium 4 Processor with SSE3 ( /Qaxp )
But i got the same result!!!!
I want to know how to fix this error !!!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The inner loop (do 50) is NOT an OpenMPparallel DO. Therefore its indexing variable J has default characteristics. For Fortran default variable attribute is SHARED. You must makeJ PRIVATE. Place the private on the !$OMP PARALLEL. That is, add J to !$OMP Parallel private(temp, is, ie, j)
Note, your outer loop
!$OMP Do
do 100 i = 1, n
IS an OpenMPparallel DO. Therefore its indexing variableI has default characteristics of PRIVATE. But you can add PRIVATE(i) to make the code clear to the (novice) reader that the variable i is private to each thread.
Also Fortran computes the iteration space of a DO loop once. Therefor, unless is and ie are used elsewhere (or if use makes code easire to read), use ofthe expression may be clearer
do 50 j = ix(i), ix(i+1)-1
With respect to C++ you must identify the scope of the variable in addition toif it is or is not an indexing variable
#pragma omp parallel for
for(int i=1;i<=n;++i)
...
for(int j=is;j<=ie;++j)
In the above, j is declared inside the scope of the for loop and therefore is private.
Whereas
int j;
#pragma omp parallel for
for(int i=1;i<=n;++i)
...
for(j=is;j<=ie;++j)
In the above, j is declared outside the scope of the for loop and therefore is shared.
So
In the first case, when you read the OpenMP specification example that simply states
"the inner loop variable j is private"
You MUST factor in the scope of j. That is
because j is declared inside the parallel region "the inner loop variable j is private"
Unfortunately, many of the document writers assume the reader will know this (and thus omit what they would think is obvious).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You MUST factor in the scope of j. That is
because j is declared inside the parallel region "the inner loop variable j is private"
This looks correct and is supported by the 3.0 specification:
- The loop iteration variable(s) in the associated do-loop(s) of a do or parallel do construct is(are) private.
- A loop iteration variable for a sequential loop in a parallel or task construct is private in the innermost such construct that encloses the loop.
Given the variability of the errors Koo reported (1.0e-13 to 1.0e-21), perhaps the problem is not the sharing or privatization of variables in parallel loops at all. Perhaps it's simple accumulatedfloating point round-off error due to variabilities in order of evaluation?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Window XP/32bit, Intel Xeon X5355 Quad Core( I used 2-single CPU before)
I settle the problem.
But, I confused!!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Window XP/32bit, Intel Xeon X5355 Quad Core( I used 2-single CPU before)
I settle the problem.
But, I confused!!!!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page