- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have written my first offload program in fortran which is displayed below. The program can get out of the parallel region where I was able to printout the values of variables ener1 and ener2. However, it cannot get out of the omp target directive I checked that out with print a statement. On the command line I get the error message: offload error: process on the device 0 unexpectedly exited with code 0. I don't understand why the program is not able to finish this offload block.
!$omp target map(to:coor,sigma_const,clase,eps_const) map(tofrom:ener1,ener2)
!$omp parallel private(i,j,fdummy1,k,l,fdummy2,fdummy3,fdummy4,fdummy5,dist)
!$omp do reduction(+:ener1)
do i=1,num_res-2
do j=i+2,num_res
fdummy1=coor(i,1,qk)-coor(j,1,qk)
fdummy2=coor(i,2,qk)-coor(j,2,qk)
fdummy3=coor(i,3,qk)-coor(j,3,qk)
dist=sqrt(fdummy1*fdummy1+fdummy2*fdummy2+fdummy3*fdummy3)
fdummy1=sigma_const(i,j)
write(6,*) 'fdum',fdummy1
k=clase(i)
l=clase(j)
fdummy2=fdummy1*fdummy1 ! 2
fdummy3=fdummy2*fdummy2 ! 4
fdummy4=fdummy2*fdummy3 ! 6
fdummy5=fdummy4*fdummy4 ! 12
fdummy1=fdummy5-fdummy4
ener1=ener1+eps_const(k,l)*fdummy1
enddo
enddo
!$omp end do
!$omp do reduction(+:ener2)
do i=1,num_res-1
fdummy1=coor(i,1,qk)-coor(i+1,1,qk)
fdummy2=coor(i,2,qk)-coor(i+1,2,qk)
fdummy3=coor(i,3,qk)-coor(i+1,3,qk)
dist=sqrt(fdummy1*fdummy1+fdummy2*fdummy2+fdummy3*fdummy3)
fdummy1=(dist-r_cero)
fdummy2=fdummy1*fdummy1
ener2=ener2+fdummy2
enddo
!$omp end do
!$omp end parallel
!$omp end target
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What version of ifort and MPSS are you using?
Can you set OFFLOAD_REPORT=3 and re-run and share the output with us?
Can you attach the complete reproducer?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
the intel compiler is 15.0.3.187. I will ask our sysadm about the MPSS. The output with your requested option is:
[Offload] [HOST] [State] Initialize logical card 0 = physical card 0
[Offload] [HOST] [State] Initialize logical card 1 = physical card 1
[Offload] [HOST] [State] Initialize logical card 2 = physical card 2
[Offload] [MIC 0] [File] energy.f90
[Offload] [MIC 0] [Line] 221
[Offload] [MIC 0] [Tag] Tag 0
[Offload] [HOST] [Tag 0] [State] Start target
[Offload] [HOST] [Tag 0] [State] Setup target entry: __offload_entry_energy_f90_221energy_ifort01021243267585XRkw5
[Offload] [HOST] [Tag 0] [State] Host->target pointer data 0
[Offload] [HOST] [Tag 0] [Signal] signal : none
[Offload] [HOST] [Tag 0] [Signal] waits : none
[Offload] [HOST] [Tag 0] [State] Gather copyin data: base=0x7ffccdece100 length=128
[Offload] [HOST] [Tag 0] [State] Create target buffer: size=384 offset=256
[Offload] [HOST] [Tag 0] [State] Gather copyin data: base=0x7ffccdec44a0 length=40000
[Offload] [HOST] [Tag 0] [State] Create target buffer: size=41184 offset=1184
[Offload] [HOST] [Tag 0] [State] Gather copyin data: base=0x7ffc9e33b300 length=800000000
[Offload] [HOST] [Tag 0] [State] Create target buffer: size=800000768 offset=768
[Offload] [HOST] [Tag 0] [State] Gather copyin data: base=0x7ffccdece180 length=480000
[Offload] [HOST] [Tag 0] [State] Create target buffer: size=480384 offset=384
[Offload] [HOST] [Tag 0] [State] Gather copyin data: base=0x49a700 length=4
[Offload] [HOST] [Tag 0] [State] Create target buffer: size=1796 offset=1792
[Offload] [HOST] [Tag 0] [State] Gather copyin data: base=0x7ffccdf435ac length=4
[Offload] [HOST] [Tag 0] [State] Create target buffer: size=1456 offset=1452
[Offload] [HOST] [Tag 0] [State] Host->target pointer data 800520136
[Offload] [HOST] [Tag 0] [State] Host->target copyin data 420
[Offload] [HOST] [Tag 0] [State] Execute task on target
[Offload] [HOST] [Tag 0] [State] Target->host pointer data 8
[Offload] [MIC 0] [Tag 0] [State] Start target entry: __offload_entry_energy_f90_221energy_ifort01021243267585XRkw5
[Offload] [MIC 0] [Tag 0] [Var] eps_const IN
[Offload] [MIC 0] [Tag 0] [Var] var$53_dv_template_V$65 IN
[Offload] [MIC 0] [Tag 0] [Var] var$53_dv_template_V$65 IN
[Offload] [MIC 0] [Tag 0] [Var] var$88_dv_template_V$a6 IN
[Offload] [MIC 0] [Tag 0] [Var] var$88_dv_template_V$a6 IN
[Offload] [MIC 0] [Tag 0] [Var] var$69_dv_template_V$85 IN
[Offload] [MIC 0] [Tag 0] [Var] var$69_dv_template_V$85 IN
[Offload] [MIC 0] [Tag 0] [Var] ener2 INOUT
[Offload] [MIC 0] [Tag 0] [Var] ener1 INOUT
[Offload] [MIC 0] [Tag 0] [Var] i INOUT
[Offload] [MIC 0] [Tag 0] [Var] j INOUT
[Offload] [MIC 0] [Tag 0] [Var] fdummy1 INOUT
[Offload] [MIC 0] [Tag 0] [Var] k INOUT
[Offload] [MIC 0] [Tag 0] [Var] l INOUT
[Offload] [MIC 0] [Tag 0] [Var] fdummy2 INOUT
[Offload] [MIC 0] [Tag 0] [Var] fdummy3 INOUT
[Offload] [MIC 0] [Tag 0] [Var] fdummy4 INOUT
[Offload] [MIC 0] [Tag 0] [Var] fdummy5 INOUT
[Offload] [MIC 0] [Tag 0] [Var] dist INOUT
[Offload] [MIC 0] [Tag 0] [Var] var$49_V$52 INOUT
[Offload] [MIC 0] [Tag 0] [Var] num_res INOUT
[Offload] [MIC 0] [Tag 0] [Var] qk INOUT
[Offload] [MIC 0] [Tag 0] [Var] r_cero INOUT
[Offload] [MIC 0] [Tag 0] [Var] var$43_V$4e INOUT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Without seeing your entire program I have one theory.
Use the compile-time offload report to see which variables are sent from MIC to CPU at the end of the offload. If one of them is a subroutine parameter, check if the subroutine was called with a constant argument. If so, then that would lead to a write into read-only memory as part of offload completion. If this situation occurs in your program, specify that variable as IN to the offload.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Rajiv Deodhar your suggestion worked well. I used the variable nn=num_res, because I didn't find how to use "IN" in this target directive. One of the variables inside the target region was a parameter in the subroutine. It is working, however it is very slow.
Do you know some tips to reduce the execution time? Maybe I am copying several variables to the MIC each time the routine is called. The only variable which needs to be copied is "coor" array.
Best regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Fortran uses call-by-reference, so function/subroutine parameters appear to the compiler as pointers. Pointer transfers incur considerable overhead. My suggestion is this:
For each subroutine parameter that is a scalar, i.e., a simple non-array variable, declare a variable of the same type within the function/subroutine. Assign the parameter to its corresponding local variable, and within the offloaded region, use the local variable, and not the parameter. Use the IN clause for these locally copied variable in the offload directive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For persistent data you can use !DIR$ OFFLOAD_TRANSFER clause[...].
If your sigma_const, clase and eps_const do not change from call to call, then this can save you some time.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page