- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the advantage of vload4 over 4 single memory accesses?
Suppose I am loading memory from local memory. Below are two kernels. The second kernel should exhibit no bank conflict.
Does the first have bank conflicts? Because, if one vload is executed per clock, then there should be conflicts in a half wave.
void kernel1() {
int start = get_global_id(0)*4;
int4 test = vload4(start,localBuffer);
}
void kernel2() {
int4 test;
int start = get_global_id(0)*4;
test.x = localBuffer[start];
test.y = localBuffer[start+1];
test.z = localBuffer[start+2];
test.w = localBuffer[start+3];
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the delayed reply. If you have not seen already, the optimization guide has some good tips in its "Memory Access Considerations" section. In general, loading vector types is always more efficient than single values. Using vector types for memory transfers makes it easier for the compiler to create efficient code. If the compiler recognizes the pattern in kernel2 it might combine loads into a pattern close enough to kernel1 that the performance will be practically identical, but kernel2 isn't expected to have any advantages over kernel1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Jeffrey. So, vload may exhibit fewer bank conflicts than single loads? Is there ever a situation
when vload performs worse than individual loads?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I've asked around, what I've understood is that vload is always better. Single loads may be combined by the compiler so in many cases there may not be a big difference, but vload is expected to be the best case. There may be a few corner cases with odd borders where single loads may have a minor advantage but I suspect these are rare. In the future I'm hoping we will be able to do more memory transfer optimization tutorials including deeper analysis so the guidelines do not need to be so high level. Please watch for more documentation/example improvements as we can prioritize them in the future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jeffrey,
Thanks for pursuing this. I am particularly interested in avoiding bank conflicts. Is there some way
of confirming that vload does not trigger more bank conflicts than separate loads?
Thanks so much,
Aaron

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page