- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm targeting Intel Graphics Technology with the API-Based offloading for asynchronous offloading. To begin, I try to offload this algorithm :
for (int i = 0; i < size; i++){ A = i; }
So I wrote this code :
__declspec(target(gfx_kernel)) void fill(int * A, int size){ _Cilk_for(int i = 0; i < size; i++){ A = i; } } int main() { int N = 1024; int * A = malloc(sizeof(int) * N); _GFX_share(A,N); _GFX_offload((void*)fill, A, N); _GFX_wait(0,-1); _GFX_unshare(A); free(A) return 0; }
This code compiles and executes, but only the 780 firsts elements of A are effectively changed. I guess that's because of the max value of groups and threads but the number seems weird to me (_GFX_get_device_hardware_thread_count() returns 336).
So I have two questions : why 780 ? and how can I write a kernel that I can call with
_GFX_offload((void *)fill, A, N);
that does what I want it to do ?
Thanks, and have a nice day
Mathieu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
_GFX_share accepts bytes count, not element. So you should have written
_GFX_share(A,sizeof(int)*N);
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
_GFX_share accepts bytes count, not element. So you should have written
_GFX_share(A,sizeof(int)*N);

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page