- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I recently got a machine with 2xXeon Phi and I am making some simple tests to better understand the features that I will need for utilizing them in our production code. Basically, in the code below I just want to measure how many exponential functions the Phi's can calculate pr. second. I first create an OpenMP thread for each Phi device and then make target sectiosn inside each of these threads. When inside the target section the code is further parallelized using openMP directives.
In the code below, the strange thing is that DevNo never seem to make it correctly to the device. The DevNo value written out in the print statement appear to be a random uninitialized value. In the version below NoDevices=1 and everything seems to work regardless of this issue, but if you instead set NoDevices=2 it doesn't. It seems to me that the code attempts to do both offloads to the same device, which sometimes result in half the performance and sometimes it crashes ..
Is this not how you are supposed to do multi device offloading with openMP 4.0? I use the latest version of parallel studio XE on Windows ..
Thank you in advance,
Casper
program Source1
use ifport
use omp_lib
use mic_lib
implicit none
real*8 :: TimeBegin,TimeEnd,GOps
real*8,allocatable,Dimension(:) :: ExpIn,ExpOut
!DEC$ATTRIBUTES ALIGN: 64 :: ExpIn,ExpOut
integer :: NumThreads,NInner,NOuter,i,j,DevNo,PhiNo,NoDevices=1,NExps=165189!NExps=1651898
!First, fill a vector with random values to calculate exp for
GOps=Random(18)
allocate(ExpIn(NExps))
do i=1,NExps
ExpIn(i)=random(0)
end do
!Now we do the actual benchmark calculation of exp's in parallel using openMP distributed over multiple phis
!Outer OMP parallel region - performing the same calculation on multiple phis in parallel
!$OMP PARALLEL NUM_THREADS(NoDevices) DEFAULT(SHARED) PRIVATE(DevNo)
DevNo=omp_get_thread_num()
print *,'Entered OMP parallel region for device', DevNo
!Initialize each target phi
!$OMP TARGET DATA DEVICE(DEVNO) MAP(to:NExps,DevNo,ExpIn(1:NExps),NumThreads)
!$OMP TARGET
NumThreads=57*4-1
!Somehow DevNo is not correct when we get here ...
print *,'Running on Xeon Phi device ',DevNo,'using',NumThreads,'threads'
NOuter=NumThreads*1000
TimeBegin=omp_get_wtime()
!Run parallel benchmark on each phi
!$OMP PARALLEL SHARED(ExpIn,NOuter,NExps) PRIVATE(I,J,ExpOut) NUM_THREADS(NumThreads)
allocate(ExpOut(NExps))
!$OMP DO SCHEDULE(DYNAMIC)
do i=1,NOuter
!$OMP SIMD
do j=1,NExps
ExpOut(j)=exp(ExpIn(j))
end do
end do
!$OMP END DO
deallocate(ExpOut)
!$OMP END PARALLEL
TimeEnd=omp_get_wtime()
GOps=1e-9*NOuter*NExps/(TimeEnd-TimeBegin)
print *,'Result:',GOps,'G exponential functions/s'
!$OMP END TARGET
!$OMP END TARGET DATA
!$OMP END PARALLEL
end program Source1
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe you need a device(DevNo) clause on your inner target directive otherwise on a multi-card system the offload defaults to device 0.
The issue with DevNo being incorrect appears to be an optimization issue. With NoDevices=2 and compiled at -O1 and adding the device(DevNo) as mentioned the code appears to run on multiple cards and produce the expected value for DevNo. At -O2, the value of DevNo is incorrect, some random value.
I have not been able to find a reasonable work around since de-optimizing is counter to your interests. Let me consult with our Developers to see if there is a usable work around.
Updated later on 06/18/2014: I reported this to Development (see internal tracking id below) for further investigation and will keep you updated on what I hear from them.
(Internal tracking id: DPD200357720)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot Kevin - looking forward to hearing from you again ...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page