Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7957 Discussions

Question regarding programming the Intel HD 4600 GPU on Haswell

Doru_Adrian_Thom_P_
357 Views

Hi! I would like to know if there is a possibility to synchronize the threads on the gpu with the threads on the cpu.

To be more specific. I have a program that has two threads. Both threads will be glued to different cpu cores, however one of the threads will just run on the cpu side, whilst the second term will offload it's work to the gpu.  I would like to know if there is a mechanism that could be put in place to have a barrier like synchronization between the cpu thread and the gpu threads?

And as a side note, when compiling with the intel compiler a code meant for gpu offloading I got the following error:

catastrophic error: Can't deduce surface for instrinsic _sfiload_si32.

Can someone please tell me what that means.

Thank you, very much.

Thom Popovici

0 Kudos
4 Replies
Anoop_M_Intel
Employee
357 Views

Hi Popovici

Regarding the synchronization between CPU and GPU threads, there is currently no explicit means. However some simple cases may probably be addressed by the current simple syntax. I mean, #pragma offload is synchronous wrt the CPU thread which uses it, so, one can do the following:
1. spawn another CPU thread to do the CPU work (e.g. _cilk_spawn)
2. Run #pragma offload in the current thread
3. After #pragma offload is complete, meaning the GPU work is also complete, wait for completion of the spawned CPU thread (_cilk_sync or implicit synchronization at the end of syntax block {})

Regarding the error "catastrophic error: Can't deduce surface for intrinsic _sfiload_si32":

Most likely this results from some unsupported pointer operations, e.g. use of pointers to pointers, or complicated pointer arithmetic, which does not allow to trace a pointer to any pointer typed argument of a kernel. Can you please share a testcase which reproduces this error so that I can look into it and also work with the development team.

Thanks and Regards
Anoop

 

0 Kudos
Doru_Adrian_Thom_P_
357 Views

I see. Thanks a lot. I thought maybe there was some sort of synchronization between the GPU and CPU, because I can have a synchronization at the CPU level, but that would mean offloading a lot, because I have loops such as:

loop (1 < i < n)

sync

loop(1 < i < n)

And I have some more constraints at the code level. Anyways, thanks alot for your answer.

Thom

0 Kudos
Doru_Adrian_Thom_P_
357 Views

Follow up queston. I managed to redo my algorithms and found a way to use offload, but now when I am compiling I am getting a very weird result. I have attached a picture. 

Thanks

0 Kudos
Anoop_M_Intel
Employee
357 Views

Hi Thom

Is it possible to attach that program or a minimal testcase which reproduces this error.

Thanks and Regards
Anoop

0 Kudos
Reply