- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi! I would like to know if there is a possibility to synchronize the threads on the gpu with the threads on the cpu.
To be more specific. I have a program that has two threads. Both threads will be glued to different cpu cores, however one of the threads will just run on the cpu side, whilst the second term will offload it's work to the gpu. I would like to know if there is a mechanism that could be put in place to have a barrier like synchronization between the cpu thread and the gpu threads?
And as a side note, when compiling with the intel compiler a code meant for gpu offloading I got the following error:
catastrophic error: Can't deduce surface for instrinsic _sfiload_si32.
Can someone please tell me what that means.
Thank you, very much.
Thom Popovici
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Popovici
Regarding the synchronization between CPU and GPU threads, there is currently no explicit means. However some simple cases may probably be addressed by the current simple syntax. I mean, #pragma offload is synchronous wrt the CPU thread which uses it, so, one can do the following:
1. spawn another CPU thread to do the CPU work (e.g. _cilk_spawn)
2. Run #pragma offload in the current thread
3. After #pragma offload is complete, meaning the GPU work is also complete, wait for completion of the spawned CPU thread (_cilk_sync or implicit synchronization at the end of syntax block {})
Regarding the error "catastrophic error: Can't deduce surface for intrinsic _sfiload_si32":
Most likely this results from some unsupported pointer operations, e.g. use of pointers to pointers, or complicated pointer arithmetic, which does not allow to trace a pointer to any pointer typed argument of a kernel. Can you please share a testcase which reproduces this error so that I can look into it and also work with the development team.
Thanks and Regards
Anoop
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see. Thanks a lot. I thought maybe there was some sort of synchronization between the GPU and CPU, because I can have a synchronization at the CPU level, but that would mean offloading a lot, because I have loops such as:
loop (1 < i < n)
sync
loop(1 < i < n)
And I have some more constraints at the code level. Anyways, thanks alot for your answer.
Thom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thom
Is it possible to attach that program or a minimal testcase which reproduces this error.
Thanks and Regards
Anoop

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page