- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a 4D array and I parallelize a loop which works on the 4D array using OpenMP parallel regions.
(See part of the code attached.)
I used totalview software for memory debugging with -g -O0 compilation flags and setting OMP_NUM_THREADS=4.
In totalview, I see that the 'phi' array, in the code below, has 5 copies created in the parallel region.
Why do I get 5 copies even though maximum number of threads is 4? Is it the same for other optimization flags, say for the commenly used -O3 flag?
Again in totalview, each copy created for array 'phi' is 4D and has the same dimension as specified in the module.
Is this is true?
How can I make the private array size phi_private(11,11,11,1) so that it will be memory efficient for the code to run on 4 threads?
Thanks.
I have a 4D array and I parallelize a loop which works on the 4D array using OpenMP parallel regions.
(See part of the code attached.)
I used totalview software for memory debugging with -g -O0 compilation flags and setting OMP_NUM_THREADS=4.
In totalview, I see that the 'phi' array, in the code below, has 5 copies created in the parallel region.
Why do I get 5 copies even though maximum number of threads is 4? Is it the same for other optimization flags, say for the commenly used -O3 flag?
Again in totalview, each copy created for array 'phi' is 4D and has the same dimension as specified in the module.
Is this is true?
How can I make the private array size phi_private(11,11,11,1) so that it will be memory efficient for the code to run on 4 threads?
Thanks.
[plain]module data real,dimension(:,:,:,:),allocatable :: phi end module data program test USE data real counter integer i,j,k,m allocate(phi(11,11,11,4)) counter = 0.0 c$omp parallel
c$omp do private(i,j,k,m) do m=1,4 do k=1,11 do j=1,11 do i=1,11 phi(i,j,k,m)=counter counter = counter + 1.2 enddo enddo enddo enddo c$omp enddo
.....
.....
c$omp end parallel
end program test[/plain]
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Amit
You don't need to declare the array phi(11,11,11,4) as private, as there're no dependence betweenloopiterations for those array elements phi(i,j,k,m). Compiler will handle this and parellel those array elements automatically. If you declare it as private, according to openmp rule,however, a separate copy of thearraywill bemade for each thread to access in private in addition to the original one. In your case,that's why a separate 4 copies, 5 in total,of arrays are created duringloop parallelism.When the loop ends, those private copies are destroyed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code as written is correct for usage of phi.
That is there is but one instance of the array phi.
You can print out the LOC%(phi(1,1,1,1)) and see all threads print the same LOC%
You do have a problem with
phi(i,j,k,m)=counter
counter = counter + 1.2
as you will not fill in the array phi with the values expected
use
phi(i,j,k,m) = ((m-1)*1200.) + ((k-1)*120.) + ((j-1)*12.) + (i-1)*1.2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Yolanda Chen (Intel)
Hi, Amit
You don't need to declare the array phi(11,11,11,4) as private, as there're no dependence betweenloopiterations for those array elements phi(i,j,k,m). Compiler will handle this and parellel those array elements automatically. If you declare it as private, according to openmp rule,however, a separate copy of thearraywill bemade for each thread to access in private in addition to the original one. In your case,that's why a separate 4 copies, 5 in total,of arrays are created duringloop parallelism.When the loop ends, those private copies are destroyed.
Hi Yolanda,
"Compiler will handle this and parallel those array elements automatically."
How do I get more information about this particular implementation?
My understanding of the implementation is as follows. Please correct me as necessary.
When a global array is accessed by a work sharing construct inside a parallel region then the OpenMP directive implementation hands out the starting and ending memory locations to each thread to work on that array and once the work sharing construct is finished (but still inside the same parallel region) the thread updates it's part of the global array.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
The code as written is correct for usage of phi.
That is there is but one instance of the array phi.
You can print out the LOC%(phi(1,1,1,1)) and see all threads print the same LOC%
You do have a problem with
phi(i,j,k,m)=counter
counter = counter + 1.2
as you will not fill in the array phi with the values expected
use
phi(i,j,k,m) = ((m-1)*1200.) + ((k-1)*120.) + ((j-1)*12.) + (i-1)*1.2
Hi Jim,
In totalview, I get different addresses for loc(phi(1,1,1,1)) but with a write to screen in the loop, I get the same address. It seems that the totalview debugger creates new address space for the temporary copies of the array for debugging purpose.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe totalview is telling you the location of the descriptor for phi and not the location of the 1st elelment in phi?
(and for whatever reason, each thread is getting a copy of the descriptor). Or totalview is broke with respect to OpenMP and arrays.
Believe in the address produced with the WRITE(*,*) LOC(PHI(1,1,1,1))
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Yolanda Chen (Intel)
Hi, Amit
You don't need to declare the array phi(11,11,11,4) as private, as there're no dependence betweenloopiterations for those array elements phi(i,j,k,m). Compiler will handle this and parellel those array elements automatically. If you declare it as private, according to openmp rule,however, a separate copy of thearraywill bemade for each thread to access in private in addition to the original one. In your case,that's why a separate 4 copies, 5 in total,of arrays are created duringloop parallelism.When the loop ends, those private copies are destroyed.
Hi, Amit
It's not quite regarding low level implementations. The openmp directives are introduced to release developer from complicated threading work and let compiler take part of the job. From that sense, I say compiler will do the work automatically.
Specifically for the loop parallel, and in your case, each loop iteration is referencing a different array element, the sbuscript combination(i,j,k,m)is unique between iterations. If we have 4 threads in total, each thread can take, e.g. 11*11*11, an average part of those iterations to run (this also up to how you specify loop scheduling), as the (i,j,k,m) are distinct, there will be no referenceoverlap cross threads for array phi, thus phi can be shared.
But for variable counter, it does sharedbetween loop iterations and we need to change thereference inadifferent way to makethe loop parallelable:
phi(i,j,k,m) = ((m-1)*11*11*11*1.2) + (k-1)*11*11*1.2 + ((j-1)*11*1.2) + (i-1)*1.2
To learn more onopenmp, there's a good artical to start:
http://software.intel.com/en-us/articles/getting-started-with-openmp/

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page