- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I am just wondering if it is possible to transfer data between an 2D NDrange kernel and 1D NDrange kernel with loops using channel extension and still maintain the ordering of the data. I wrote kernel where each thread of the 2D NDrange kernel writes 1 data element into the channel, and each thread of the 1D NDrange kernel reads 1 data element from the channel. The block size of the kernel are identical, so each thread in the second kernel use loop to read the data elements out. The kernels returns correct result in emulation but after compiled, the results are all wrong when number of work groups is more than 1. I am wondering if there is a limitation on the channel extension to prevent this kind of arrangement to work correctly, or if I did something wrong? Thanks!Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you found a solution for this problem yet?
According to the datasheet, ordering should be maintained throughout the work groups (the AOCL programming guide page 1-20 and 1-21). In order to meet this ordering though a few conditions are required. Do you get a warning stating that channels may not have well defined ordering when you compile the kernels?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's been my experience that you get much better consistency between emulator and the actual device when you only use single-work-item kenels (no NDRange, not get_global_id, etc). Usually it's quite easy to take an existing NDRange kernel and wrap a loop around it to make it a single-work-item kernel. It's the first thing I'd try.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the reply guys! I found out that the if I change the order of memory access the channel will work correctly. So the problem was due to a bug in my code. Interestingly if the problem size is too large the channel may get stuck for some reason. I simply used global memory in the end, which was reasonably fast anyway.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page