- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I've got some problem. I write an motion estimation algorithm using OpenCL. I've got sophisticated structures, like this:
HOST SIDE:
typedef struct ctu_info_t ctu_info_t; typedef struct cu_info_t cu_info_t; typedef struct thread_t thread_t; struct cu_info_t { cl_int array[100]; /*and so on*/ }; struct ctu_info_t { cu_info_t cu_info; cl_int array_1[1000]; cl_int array_2[2000]; /*and so on*/ }; struct thread_t { ctu_info_t ctu_rd; /*another fields*/ };
DEVICE SIDE:
typedef struct ctu_info_t ctu_info_t; typedef struct cu_info_t cu_info_t; typedef struct thread_t thread_t; struct cu_info_t { int array[100]; /*and so on*/ }; struct ctu_info_t { cu_info_t cu_info; int array_1[1000]; int array_2[2000]; /*and so on*/ }; struct thread_t { ctu_info_t ctu_rd; /*another fields*/ };
I don't have any trouble with setting kernel's arguments (kernel func looks like this "__kernel void my_func(ctu_info_t *ctu_info, thread_t *thread) { /*here is the code...*/ }"), but when I call "clEnqueueNDRangeKernel()" I've got this:
Can anybody help to solve this trouble? I don't know what's wrong I did.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Crash is coming from nvopencl.dll , this looks like Nvidia OpenCL driver.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Afaik, opencl is best served with arrays and building structs inside devices. When structs need to be sent between host and device, both sides has to have exact same alignment and size for each struct defined.
Opencl version <= 1.2 does not handle this and leaves it to mercy of drivers and responsibility of developer.
Some of the rules I remember:
- struct size = power of 2(adding necessary dummy variables)
- biggest fields(inner structs, arrays, vectors) on top, smallest fields on bottom
- struct array aligned on struct size address
- packing host-side structs equally with what device is doing
also using pure arrays gives best performance since you get only the needed fields from arrays instead of whole structs which are sub-optimally using memory banks.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page