Software Archive
Read-only legacy content
17061 Discussions

offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)

zhou
Beginner
619 Views

Hi!

I have tried to use tree threads for CPU and MIC cooperation.

But there is error "offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)"

The compile operation is "ifort mic.f90 -mkl -openmp". The codes are as follow:  

program mic
use mic_lib
use omp_lib
implicit none
integer::mics,idx

DOUBLE PRECISION,allocatable::A(:)
DOUBLE PRECISION,allocatable::B(:)
DOUBLE PRECISION,allocatable::C(:)
allocate(A(256*256))
allocate(B(256*256))
allocate(C(256*256))

mics = offload_number_of_devices()

!dir$ attributes offload:mic :: DGEMM

!$OMP PARALLEL PRIVATE(idx) NUM_THREADS(mics+1)
!$OMP DO SCHEDULE (static)
do idx=0,mics
if(idx==mics) then
CALL DGEMM('N','N',256,256,256,1.d0,A,256,B,256,0.d0,C,256)
else
!dir$ offload target(mic:idx) in(A,B:length(256*256)) out(C:length(256*256))
CALL DGEMM('N','N',256,256,256,1.d0,A,256,B,256,0.d0,C,256)
end if
end do
!$OMP END DO
!$OMP END PARALLEL

deallocate(A)
deallocate(B)
deallocate(C)
end program mic

zhou

0 Kudos
5 Replies
TaylorIoTKidd
New Contributor I
619 Views

Hi,

One of our experts looked at your problem.

He believes that this may be a bug and has submitted a bug report.

As experts are want to do, he made some recommendations as well: The threads may overwrite each other when they write back to the host; and it is a good idea to initialize your data when you allocate it. None of these will cause the issue you observe.

Regards
--
Taylor
 

0 Kudos
Dave_O_
Beginner
619 Views

I have exactly the same error. How do I fix it?

0 Kudos
Kevin_D_Intel
Employee
619 Views

In the case cited in the original post the compiler mishandles the constants (256, 0.d0, 1.d0) for the offload region treating them as read-write variables instead of read-only. That leads to the seg-fault when exiting from the offload region trying to write their values back to the host.

The issue is expected to be fixed in the coming Update 3 later this month. To work around you would unfortunately need to define variables to pass in the constants, but do not declare with parameter as those are disallowed in IN/OUT/INOUT. Something like this enables the case to run successfully:

integer :: size=256
double precision:: zero_Dbl=0.d0
double precision:: one_Dbl=1.d0
 

0 Kudos
Martyn_C_Intel
Employee
619 Views

The 14.0 update 3 compiler (14.0.3.174) containing the fix has now been posted and is available for download from the Intel registration center.

0 Kudos
zhou
Beginner
619 Views

Thanks very much for all of your replies.
The code can implement well on our system now.
 

0 Kudos
Reply