- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a kernel that processes RGB images. Currently, I take each channel one by one, and run the same kernel on that channel
The kernel input is a global memory buffer: data is moved in chunks from the global buffer into local memory for processing, then stored into another global buffer as output.
I was thinking of refactoring this to store all three channels in an RGBA buffer, and operate on all three channels at the same time, using vector operations. I understand that images have better spatial caching.
Is there any disadvantage to this refactor? I realize that I will have to reduce the number of pixels per chunk, because I will now have three times the amount of data.
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm still new to Intel OpenCL, but it definitely seems like there could be performance advantages to working with local memory, especially if you can convert to vector types as you mentioned. According to the optimization guide buffers have higher bandwidth. It goes further to state
Whether or not you see a significant improvement is likely to depend a lot on your algorithm. The main goal of course is to partition the work so that you get the maximum reuse from local memory.
To improve performance on the Intel Processor Graphics, do the following:
Avoid images, except for irregular access patterns. For example, use buffers when processing in memory (in row-major) order.
The biggest disadvantage to the refactor will probably be the time to do it and debug it. Moving data in and out of images can be relatively simple compared to the complexity of managing local buffers, especially if border values are needed.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page