- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think I have a hack to use USM in Fortran.
There may be a better way to do this, but I am at a loss for finding an official way.
Some GPU's support USM. The objective is for the virtual machine of the CPU and the virtual machine of the GPU to be able to map the same addresses. Such that when the host de-references a USM address and the GPU de-references the same address they access the same data be it in the host RAM or GPU RAM. The drivers may migrate the data over the PCIe bus or directly access the variable over the PCIe bus should the data not reside in the accessors local memory.
The major benefit of this is less code to change when porting and app to use GPU, and more importantly, to have the same set of source code (!$omp with directives) run without a GPU (or one not supporting USM).
The problem to overcome is to construct a way such that an entire array can reside in USM .AND. not be transferred in whole as you enter and leave an offload region.
I have been unable to locate an OpenMP 5.0 way of doing this (OpenMP 4.0 seemed to have this ability, but this directive has been removed from 5.0).
Now the hack:
program TestGPU
use myDPCPPlib
use omp_lib
USE, INTRINSIC :: ISO_C_BINDING
implicit none
!$omp requires UNIFIED_SHARED_MEMORY
! Variables
integer i,j
integer, parameter :: nCols = 4
integer :: nRows
type(C_PTR) :: blob
integer(C_INTPTR_T) :: x
type boink
real, pointer :: arrayShared(:,:)
end type boink
type(boink) :: theBoink
real, pointer :: hack(:,:)
real :: sum(ncols)
nRows = 500
blob = omp_aligned_alloc (64, nRows*sizeof(sum), omp_target_shared_mem_alloc)
call C_F_POINTER(blob, hack, [nCols,nRows])
theBoink%arrayShared => hack
do j=1,size(theBoink%arrayShared, dim=2)
do i=1,4
theBoink%arrayShared(i,j) = i*j
end do
end do
do j=1,nRows
sum = sum + theBoink%arrayShared(:,j)
end do
print *,sum
!$omp target teams distribute parallel do map(theBoink,sum) reduction(+:sum)
do j=1,nRows
sum = sum + theBoink%arrayShared(:,j)
end do
!$omp end target teams distribute parallel do
print *,sum
end program TestGPU
Output:
125250.0 250500.0 375750.0 501000.0
250500.0 501000.0 751500.0 1002000.
the problem is, the reduction(+:sum) is not implicitly zeroed.
I can manually zero the sum value outside the offload region, but I am worried that if the reduction isn't working for the zeroing, that it may not be working for a race condition (especially if writes are involved).
Can anyone shed light on this.
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@jimdempseyatthecove the summation works as expected, however, it's unsafe if you do not initialize sum()=0 before your host summation via compiler options. OpenMP reductions do not zero the initial value of variable, at least that's my understanding of the standard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oooo, I read the (+:var) initialization wrong (it is omp_priv=0 not omp_in=0)
Thanks for pointing this out.
Do you know of an easire way to have a module variable in USM (in particular an array descriptor, or pointer)?
Something similar to what is done with !$omp threadprivate, though with USM.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page