Software Archive
Read-only legacy content
17061 Discussions

Segfault with asynchronous offload

aketh_t_
Beginner
638 Views

Hi

I am facing a segmentation fault problem when I try to perform asynchronous offload at a later point which I believe is the time when the function returned.

here is the code

functionA

real :: WORKN(X,Y,Z)

!dir$ offload begin target(MIC:0)signal(1)out(WORKN)
   do k=1,km
           .
           !Lots of code
!dir$ end offload

end function A


functionB

real :: Dummy

.
.
! Do Whatever

.
.
.
end function B




program run

!$OMP PARALLEL DO DEFAULT(SHARED) !this OMP is dummy i.e num_blockblock is 1, can be removed

do iblock=1,num_blocks

      call functionA(iblock) !here is the async offload

enddo

!$OMP END PARALLEL DO


!$OMP PARALLEL DO DEFAULT(SHARED)

do iblock=1,num_blocks

      call functionB(iblock) get segfault when B is running

enddo

!$OMP END PARALLEL DO

end program run
    

 
 

 

0 Kudos
4 Replies
Ravi_N_Intel
Employee
638 Views

Where do you wait for the async offload to be completed.  I don't see it in the code you posted.  I could speculate that  the async offload completed after your main program exited since I don't see the wait for the async offload.
Best if you can provide us a small reproduce.

0 Kudos
Rajiv_D_Intel
Employee
638 Views

Without the save attribute, WORKN is a stack variable. You issue an offload which writes into it, and then you return from the function. At some point in time, a write to WORKN occurs, potentially, outside the stack, or into the stack frame of another function.

That is most likely the cause of the problem.

0 Kudos
jimdempseyatthecove
Honored Contributor III
638 Views

Also, you cannot use the same signal for multiple threads. Each thread should use a different signal value, you might consider using a reference to a thread stack local variable or a unique variable to the thread, such as the thread ID. *** The OpenMP omp_get_thread_num() will not be a unique number in the event of nested parallelism, so be careful about using that for a unique number.

As mentioned by others, you should also have a signal wait.

Jim Dempsey

0 Kudos
aketh_t_
Beginner
638 Views

The problem was the missing wait statement. Thank you guys.

0 Kudos
Reply