Community
cancel
Showing results for 
Search instead for 
Did you mean: 
aketh_t_
Beginner
120 Views

Segfault with asynchronous offload

Hi

I am facing a segmentation fault problem when I try to perform asynchronous offload at a later point which I believe is the time when the function returned.

here is the code

functionA

real :: WORKN(X,Y,Z)

!dir$ offload begin target(MIC:0)signal(1)out(WORKN)
   do k=1,km
           .
           !Lots of code
!dir$ end offload

end function A


functionB

real :: Dummy

.
.
! Do Whatever

.
.
.
end function B




program run

!$OMP PARALLEL DO DEFAULT(SHARED) !this OMP is dummy i.e num_blockblock is 1, can be removed

do iblock=1,num_blocks

      call functionA(iblock) !here is the async offload

enddo

!$OMP END PARALLEL DO


!$OMP PARALLEL DO DEFAULT(SHARED)

do iblock=1,num_blocks

      call functionB(iblock) get segfault when B is running

enddo

!$OMP END PARALLEL DO

end program run
    

 
 

 

0 Kudos
4 Replies
Ravi_N_Intel
Employee
120 Views

Where do you wait for the async offload to be completed.  I don't see it in the code you posted.  I could speculate that  the async offload completed after your main program exited since I don't see the wait for the async offload.
Best if you can provide us a small reproduce.

Rajiv_D_Intel
Employee
120 Views

Without the save attribute, WORKN is a stack variable. You issue an offload which writes into it, and then you return from the function. At some point in time, a write to WORKN occurs, potentially, outside the stack, or into the stack frame of another function.

That is most likely the cause of the problem.

jimdempseyatthecove
Black Belt
120 Views

Also, you cannot use the same signal for multiple threads. Each thread should use a different signal value, you might consider using a reference to a thread stack local variable or a unique variable to the thread, such as the thread ID. *** The OpenMP omp_get_thread_num() will not be a unique number in the event of nested parallelism, so be careful about using that for a unique number.

As mentioned by others, you should also have a signal wait.

Jim Dempsey

aketh_t_
Beginner
120 Views

The problem was the missing wait statement. Thank you guys.

Reply