Software Archive
Read-only legacy content

Error while offloading a parallel region

Pedro_O_1
Beginner
366 Views

Hi,

I have written my first offload program in fortran which is displayed below. The program can get out of the parallel region where I was able to printout the values of variables ener1 and ener2. However, it cannot get out of the omp target directive I checked that out with print a statement. On the command line I get the error message: offload error: process on the device 0 unexpectedly exited with code 0. I don't understand why the program is not able to finish this offload block.

!$omp target map(to:coor,sigma_const,clase,eps_const) map(tofrom:ener1,ener2)
!$omp parallel private(i,j,fdummy1,k,l,fdummy2,fdummy3,fdummy4,fdummy5,dist)
!$omp do reduction(+:ener1)
do i=1,num_res-2
  do j=i+2,num_res

   fdummy1=coor(i,1,qk)-coor(j,1,qk)
   fdummy2=coor(i,2,qk)-coor(j,2,qk)
   fdummy3=coor(i,3,qk)-coor(j,3,qk)
   dist=sqrt(fdummy1*fdummy1+fdummy2*fdummy2+fdummy3*fdummy3)

   fdummy1=sigma_const(i,j)                                                                                                                               
   write(6,*) 'fdum',fdummy1
   k=clase(i)
   l=clase(j)
   fdummy2=fdummy1*fdummy1      ! 2
   fdummy3=fdummy2*fdummy2      ! 4
   fdummy4=fdummy2*fdummy3      ! 6
   fdummy5=fdummy4*fdummy4      ! 12

   fdummy1=fdummy5-fdummy4

   ener1=ener1+eps_const(k,l)*fdummy1

  enddo
enddo
!$omp end do

!$omp do reduction(+:ener2)
do i=1,num_res-1
   fdummy1=coor(i,1,qk)-coor(i+1,1,qk)
   fdummy2=coor(i,2,qk)-coor(i+1,2,qk)
   fdummy3=coor(i,3,qk)-coor(i+1,3,qk)
   dist=sqrt(fdummy1*fdummy1+fdummy2*fdummy2+fdummy3*fdummy3)
      fdummy1=(dist-r_cero)
      fdummy2=fdummy1*fdummy1
      ener2=ener2+fdummy2
enddo
!$omp end do
!$omp end parallel
!$omp end target

 

0 Kudos
6 Replies
Kevin_D_Intel
Employee
366 Views

What version of ifort and MPSS are you using?
Can you set OFFLOAD_REPORT=3 and re-run and share the output with us?
Can you attach the complete reproducer?

0 Kudos
Pedro_O_1
Beginner
366 Views

Hi,

the intel compiler is 15.0.3.187.  I will ask our sysadm about the MPSS. The output with your requested option is:

 

[Offload] [HOST]          [State]           Initialize logical card 0 = physical card 0
[Offload] [HOST]          [State]           Initialize logical card 1 = physical card 1
[Offload] [HOST]          [State]           Initialize logical card 2 = physical card 2
[Offload] [MIC 0] [File]                    energy.f90
[Offload] [MIC 0] [Line]                    221
[Offload] [MIC 0] [Tag]                     Tag 0
[Offload] [HOST]  [Tag 0] [State]           Start target
[Offload] [HOST]  [Tag 0] [State]           Setup target entry: __offload_entry_energy_f90_221energy_ifort01021243267585XRkw5
[Offload] [HOST]  [Tag 0] [State]           Host->target pointer data 0
[Offload] [HOST]  [Tag 0] [Signal]          signal : none
[Offload] [HOST]  [Tag 0] [Signal]          waits  : none
[Offload] [HOST]  [Tag 0] [State]           Gather copyin data: base=0x7ffccdece100 length=128
[Offload] [HOST]  [Tag 0] [State]           Create target buffer: size=384 offset=256
[Offload] [HOST]  [Tag 0] [State]           Gather copyin data: base=0x7ffccdec44a0 length=40000
[Offload] [HOST]  [Tag 0] [State]           Create target buffer: size=41184 offset=1184
[Offload] [HOST]  [Tag 0] [State]           Gather copyin data: base=0x7ffc9e33b300 length=800000000
[Offload] [HOST]  [Tag 0] [State]           Create target buffer: size=800000768 offset=768
[Offload] [HOST]  [Tag 0] [State]           Gather copyin data: base=0x7ffccdece180 length=480000
[Offload] [HOST]  [Tag 0] [State]           Create target buffer: size=480384 offset=384
[Offload] [HOST]  [Tag 0] [State]           Gather copyin data: base=0x49a700 length=4
[Offload] [HOST]  [Tag 0] [State]           Create target buffer: size=1796 offset=1792
[Offload] [HOST]  [Tag 0] [State]           Gather copyin data: base=0x7ffccdf435ac length=4
[Offload] [HOST]  [Tag 0] [State]           Create target buffer: size=1456 offset=1452
[Offload] [HOST]  [Tag 0] [State]           Host->target pointer data 800520136
[Offload] [HOST]  [Tag 0] [State]           Host->target copyin data 420 
[Offload] [HOST]  [Tag 0] [State]           Execute task on target
[Offload] [HOST]  [Tag 0] [State]           Target->host pointer data 8
[Offload] [MIC 0] [Tag 0] [State]           Start target entry: __offload_entry_energy_f90_221energy_ifort01021243267585XRkw5
[Offload] [MIC 0] [Tag 0] [Var]             eps_const  IN
[Offload] [MIC 0] [Tag 0] [Var]             var$53_dv_template_V$65  IN
[Offload] [MIC 0] [Tag 0] [Var]             var$53_dv_template_V$65  IN
[Offload] [MIC 0] [Tag 0] [Var]             var$88_dv_template_V$a6  IN
[Offload] [MIC 0] [Tag 0] [Var]             var$88_dv_template_V$a6  IN
[Offload] [MIC 0] [Tag 0] [Var]             var$69_dv_template_V$85  IN
[Offload] [MIC 0] [Tag 0] [Var]             var$69_dv_template_V$85  IN
[Offload] [MIC 0] [Tag 0] [Var]             ener2  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             ener1  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             i  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             j  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             fdummy1  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             k  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             l  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             fdummy2  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             fdummy3  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             fdummy4  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             fdummy5  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             dist  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             var$49_V$52  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             num_res  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             qk  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             r_cero  INOUT
[Offload] [MIC 0] [Tag 0] [Var]             var$43_V$4e  INOUT

 

0 Kudos
Rajiv_D_Intel
Employee
366 Views

Without seeing your entire program I have one theory.

Use the compile-time offload report to see which variables are sent from MIC to CPU at the end of the offload. If one of them is a subroutine parameter, check if the subroutine was called with a constant argument. If so, then that would lead to a write into read-only memory as part of offload completion. If this situation occurs in your program, specify that variable as IN to the offload.

0 Kudos
Pedro_O_1
Beginner
366 Views

Hi,

Rajiv Deodhar your suggestion worked well. I used the variable nn=num_res, because I didn't find how to use "IN" in this target directive. One of the variables inside the target region was a parameter in the subroutine. It is working, however it is very slow. 

Do you know some tips to reduce the execution time? Maybe I am copying several variables to the MIC each time the routine is called. The only variable which needs to be copied is "coor" array.

Best regards.

0 Kudos
Rajiv_D_Intel
Employee
366 Views

Fortran uses call-by-reference, so function/subroutine parameters appear to the compiler as pointers. Pointer transfers incur considerable overhead. My suggestion is this:

For each subroutine parameter that is a scalar, i.e., a simple non-array variable, declare a variable of the same type within the function/subroutine. Assign the parameter to its corresponding local variable, and within the offloaded region, use the local variable, and not the parameter. Use the IN clause for these locally copied variable in the offload directive.

0 Kudos
jimdempseyatthecove
Honored Contributor III
366 Views

For persistent data you can use !DIR$ OFFLOAD_TRANSFER clause[...].

If your sigma_const, clase and eps_const do not change from call to call, then this can save you some time.

Jim Dempsey

0 Kudos
Reply