- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to the book " Intel Xeon Phi Coprocessor High-Performance Programming", we can move data from one variable to another. I tried to follow the example and I found it worked:
Code:
program example
real , target :: a(5),b(10)
a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5
print *,'*************************'
print *,'a:'
print *, a
!dir$ offload begin target (mic:0) in(a(1:5): into(b(1:5)) alloc_if(.true.) free_if(.false.) )
print *, 'b on the phi'
print *, b(1:5)
b=b+10
!dir$ end offload
!dir$offload_transfer target(mic:0) out(b(1:5) : into(a(1:5)) alloc_if(.false.))
print *,'*************************'
print *,'a:'
print *, a
end program example
I have an array A on the host and I copy them into an array B which is on the Xeon Phi. I add 10 to all elements in the B and then offload elements in the B on the Xeon Phi to the A on the host. the result is:
However if I use pointers, then there would be an error.
Code 2:
program example
real , target :: a(5),b(10)
real , pointer :: a_p(:),b_p(:)
a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5
a_p=>a
b_p=>b
print *,'*************************'
print *,'a:'
print *, a
!dir$ offload begin target (mic:0) in(a_p(1:5): into(b_p(1:5)) alloc_if(.true.) free_if(.false.) )
print *, 'b on the phi'
print *, b_p(1:5)
b_p=b_p+10
!dir$ end offload
!dir$offload_transfer target(mic:0) out(b_p(1:5) : into(a_p(1:5)) alloc_if(.false.))
print *,'*************************'
print *,'a:'
print *, a
end program example
result 2:
Looks like something is wrong when I try to copy things back.
Does the into support pointers? We'll need pointers to arrays in real project.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the format, here is the new one:
According to the book " Intel Xeon Phi Coprocessor High-Performance Programming", we can move data from one variable to another. I tried to follow the example and I found it worked:
Code:
program example
real , target :: a(5),b(10)a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5print *,'*************************'
print *,'a:'
print *, a
!dir$ offload begin target (mic:0) in(a(1:5): into(b(1:5)) alloc_if(.true.) free_if(.false.) )
print *, 'b on the phi'
print *, b(1:5)
b=b+10
!dir$ end offload!dir$offload_transfer target(mic:0) out(b(1:5) : into(a(1:5)) alloc_if(.false.))
print *,'*************************'
print *,'a:'
print *, a
end program example
I have an array A on the host and I copy them into an array B which is on the Xeon Phi. I add 10 to all elements in the B and then offload elements in the B on the Xeon Phi to the A on the host. the result is:
However if I use pointers, then there would be an error.
Code 2:
program example
real , target :: a(5),b(10)
real , pointer :: a_p(:),b_p(:)a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5a_p=>a
b_p=>b
print *,'*************************'
print *,'a:'
print *, a
!dir$ offload begin target (mic:0) in(a_p(1:5): into(b_p(1:5)) alloc_if(.true.) free_if(.false.) )
print *, 'b on the phi'
print *, b_p(1:5)
b_p=b_p+10
!dir$ end offload!dir$offload_transfer target(mic:0) out(b_p(1:5) : into(a_p(1:5)) alloc_if(.false.))
print *,'*************************'
print *,'a:'
print *, a
end program example
result 2:
Looks like something is wrong when I try to copy things back.
Does the into support pointers? We'll need pointers to arrays in real project.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe there may be a method to accomplish this. I'm double checking with Development about a possible solution that I created.
Can the arrays be allocatable?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kevin Davis (Intel) wrote:
I believe there may be a method to accomplish this. I'm double checking with Development about a possible solution that I created.
Can the arrays be allocatable?
Yes, I've tried them as allocatable. The allocatable array can work but the pointers to the allocatable array still can not work. I also found that you can allocate your allocatable array on the host for size of one while it's brother on the Phi can still have whatever the size you specified in the into directive.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here's an example of how to use pointers to allocatable arrays.
[cpp]
program example
real , allocatable, dimension(:),target :: a,b
real , pointer :: a_p(:),b_p(:)
allocate(a(5))
allocate(b(10))
a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5
b=0
a_p=>a
b_p=>b
print *,'*************************'
print *,'a:'
print *, a
! Allocate pointer and memory on coprocessor
!DIR$ OFFLOAD_transfer target(mic:0) in( b_p : length(10) free_if(.FALSE.) )
! Transfer a_p into (part of) b_p and only modify some values
!DIR$ OFFLOAD begin target(mic:0) in( a_p : length(5) into(b_p(1:5)) free_if(.FALSE.))
print *, 'b on the phi'
print *, b_p(1:5)
!b_p=b_p+10
! Update only some uploaded values
b_p(3:5)=b_p(3:5)+10
!dir$ end offload
! Zero a on CPU to demonstrate transfers above worked
a=0
!DIR$ OFFLOAD_transfer target(mic:0) out( b_p : length(5) into(a_p(1:5)) alloc_if(.false.) free_if(.FALSE.) )
print *,'*************************'
print *,'a:'
print *, a
end program example[/cpp]
[plain]$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008
$ ifort example.F90
$ ./a.out
*************************
a:
1.000000 2.000000 3.000000 4.000000 5.000000
b on the phi
1.000000 2.000000 3.000000 4.000000 5.000000
*************************
a:
1.000000 2.000000 13.00000 14.00000 15.00000[/plain]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I apologize for the lousy looking post. I will try to correct it. Once again we've made forum changes and now methods I used before for posting code/text are no longer working.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kevin Davis (Intel) wrote:
Here's an example of how to use pointers to allocatable arrays.
program example real , allocatable, dimension(:),target :: a,b real , pointer :: a_p(:),b_p(:) allocate(a(5)) allocate(b(10)) a(1)=1 a(2)=2 a(3)=3 a(4)=4 a(5)=5 b=0 a_p=>a b_p=>b print *,'*************************' print *,'a:' print *, a ! Allocate pointer and memory on coprocessor !DIR$ OFFLOAD_transfer target(mic:0) in( b_p : length(10) free_if(.FALSE.) ) ! Transfer a_p into (part of) b_p and only modify some values !DIR$ OFFLOAD begin target(mic:0) in( a_p : length(5) into(b_p(1:5)) free_if(.FALSE.)) print *, 'b on the phi' print *, b_p(1:5) !b_p=b_p+10 ! Update only some uploaded values b_p(3:5)=b_p(3:5)+10 !dir$ end offload ! Zero a on CPU to demonstrate transfers above worked a=0 !DIR$ OFFLOAD_transfer target(mic:0) out( b_p : length(5) into(a_p(1:5)) alloc_if(.false.) free_if(.FALSE.) ) print *,'*************************' print *,'a:' print *, a end program example
$ ifort -V Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008 $ ifort example.F90 $ ./a.out ************************* a: 1.000000 2.000000 3.000000 4.000000 5.000000 b on the phi 1.000000 2.000000 3.000000 4.000000 5.000000 ************************* a: 1.000000 2.000000 13.00000 14.00000 15.00000
Thank you! This can work!
However, this still cannot fully solve my problem. I'll explain to you why I tried to use "into" at the first place.
I'm working on a project which has been mostly finished. What I suppose to do is make a subroutine which can offload part of our data to the Phi and do some calculation there. Consider out situation that all the data need to be calculated has already been allocated in former stage, the best way to do it is to have a derived type which has pointers pointing to the data in the memory.
Another the thing we need to consider is data duplication. We don't want to allocate new memory on the host and do nothing just inorder to have arrays be allocated on the Phi. After doing some experiment I came up with an idea:
program into
real,allocatable,target:: a(:), b(:), c(:)
integer :: m,i,j
real,pointer :: a_p(:),b_p(:),c_p(:)
m=10allocate(a(m))
allocate(b(m))
allocate(c(1))
!a_p => a
!b_p => b
!c_p => c
do i=1, 10
a(i)=i
b(i)=40+i
end do
print *,'***********************************************'
print *,'a:'
print *,a
print *,'***********************************************'
print *,'b:'
print *,b
print *,'***********************************************'
print *, ' Start offload'
!dir$ offload_transfer target(mic:0) in(b(1:m): into(c(m+1:2*m) ) alloc_if(.true.) free_if(.false.))
!dir$ offload begin target(mic:0) in(a(1:m): into(c(1:m)) alloc_if(.true.) free_if(.false.))
print *, 'C on the Phi'
call calc(c(1:m),c(m+1:2*m),m)print *,'***********************************************'
print *, c(1:20)
print *,'***********************************************'
!dir$ end offload!dir$ offload_transfer target(mic:0) out(c(1:m) : into(b(1:m)) alloc_if(.false.) free_if(.false.))
!dir$ offload_transfer target(mic:0) out(c(m+1:2*m) : into(a(1:m)) alloc_if(.false.) free_if(.false.))print *, ' End offload'
print *,'***********************************************'
print *,'a:'
print *,a
print *,'***********************************************'
print *,'b:'
print *,b
print *,'***********************************************'contains
!dir$ attributes offload : mic :: calc
subroutine calc(a,b,m)
integer :: m
real :: a(m),b(m)
a=a+10
b=b*10
end subroutine calc
end program into
1,1 All
the result is:
***********************************************
a:
1.000000 2.000000 3.000000 4.000000 5.000000
6.000000 7.000000 8.000000 9.000000 10.00000
***********************************************
b:
41.00000 42.00000 43.00000 44.00000 45.00000
46.00000 47.00000 48.00000 49.00000 50.00000
***********************************************
Start offload
C on the Phi
***********************************************
11.00000 12.00000 13.00000 14.00000 15.00000
16.00000 17.00000 18.00000 19.00000 20.00000
410.0000 420.0000 430.0000 440.0000 450.0000
460.0000 470.0000 480.0000 490.0000 500.0000
***********************************************
End offload
***********************************************
a:
410.0000 420.0000 430.0000 440.0000 450.0000
460.0000 470.0000 480.0000 490.0000 500.0000
***********************************************
b:
11.00000 12.00000 13.00000 14.00000 15.00000
16.00000 17.00000 18.00000 19.00000 20.00000
***********************************************
First, the C array on the host doesn't asked for much space, just 1. Second we can offload two arrays into a bigger array on the Phi, and do some calculation there. At last we can copy things back. This solved thi data duplication problem and gave us a way to have complicated data structure on the Phi. Imagine we have 10 instance of the same problem need to be calculated, instead of doing 10 offload, we can offload 10 array to a bigger array on the Phi.
However, if I change the code to pointers:
program into
real,allocatable,target:: a(:), b(:), c(:)
integer :: m,i,j
real,pointer :: a_p(:),b_p(:),c_p(:)
m=10allocate(a(m))
allocate(b(m))
allocate(c(1))
a_p => a
b_p => b
c_p => c
do i=1, 10
a(i)=i
b(i)=40+i
end do
print *,'***********************************************'
print *,'a:'
print *,a
print *,'***********************************************'
print *,'b:'
print *,b
print *,'***********************************************'
print *, ' Start offload'
!dir$ offload_transfer target(mic:0) in(b_p(1:m): into(c_p(m+1:2*m) ) alloc_if(.true.) free_if(.false.))
!dir$ offload begin target(mic:0) in(a_p(1:m): into(c_p(1:m)) alloc_if(.true.) free_if(.false.))
print *, 'C on the Phi'
call calc(c_p(1:m),c_p(m+1:2*m),m)print *,'***********************************************'
print *, c_p(1:20)
print *,'***********************************************'
!dir$ end offload!dir$ offload_transfer target(mic:0) out(c_p(1:m) : into(b_p(1:m)) alloc_if(.false.) free_if(.false.))
!dir$ offload_transfer target(mic:0) out(c_p(m+1:2*m) : into(a_p(1:m)) alloc_if(.false.) free_if(.false.))print *, ' End offload'
print *,'***********************************************'
print *,'a:'
print *,a
print *,'***********************************************'
print *,'b:'
print *,b
print *,'***********************************************'contains
!dir$ attributes offload : mic :: calc
subroutine calc(a,b,m)
integer :: m
real :: a(m),b(m)
a=a+10
b=b*10
end subroutine calc
end program into
The result will be:
***********************************************
a:
1.000000 2.000000 3.000000 4.000000 5.000000
6.000000 7.000000 8.000000 9.000000 10.00000
***********************************************
b:
41.00000 42.00000 43.00000 44.00000 45.00000
46.00000 47.00000 48.00000 49.00000 50.00000
***********************************************
Start offload
C on the Phi
***********************************************
11.00000 12.00000 13.00000 14.00000 15.00000
16.00000 17.00000 18.00000 19.00000 20.00000
410.0000 420.0000 430.0000 440.0000 450.0000
460.0000 470.0000 480.0000 490.0000 500.0000
***********************************************
End offload
***********************************************
a:
410.0000 420.0000 430.0000 440.0000 450.0000
460.0000 470.0000 480.0000 490.0000 500.0000
***********************************************
b:
2.9426954E-38 12.00000 13.00000 14.00000 15.00000
16.00000 17.00000 18.00000 19.00000 20.00000
***********************************************
You can find that the random number in c(1) on the host will be copyed to the b(1)
Any solutions?
Thank you for your patience and I really appriciate your help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In your program into pointer version, change the second to last offload_transfer from this:
!dir$ offload_transfer target(mic:0) out(c_p(1:m) : into(b_p(1:m)) alloc_if(.false.) free_if(.false.))
to this:
!dir$ offload_transfer target(mic:0) out(c_p : length(m) into(b_p(1:m)) alloc_if(.false.) free_if(.false.))
I'm still discussing w/Development whether your original statement with c_p(1:m) exposes a defect, I think it does.
Also, I believe the user's method posted here which avoids INTO might be useful for your needs too. It avoids the additional CPU allocation for the pointer (array c in your case).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can disregard my suggested change. Despite the apparent success/correct results, Development reaffirmed you currently cannot allocate more on the coprocessor than on the CPU. There is an active feature request to support what you coded for c and c_p; however, currently depending on the memory layout in a different or larger application, it is probable additional offloads will produce unpredictable results.
I will keep this thread updated on the status of that request (internal tracking id noted below).
(Internal tracking id: DPD200245090 - Offload "in( a(n) : into b)" clause should not need b to be allocated of size n)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page