- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi,
The profiler is showing me the following measurements for the read on the "K_contributors" global arg: Bandwidth: 0.1 MB/s, 100 % efficiency Average Burst Size: 2.0 (Max Burst size: 16 )
// THIS IS A SINGLE WORK-ITEM KERNEL# define MAX_CONTRIBUTORS 8128
void Krnl_IntraE(...
__global const char3* restrict K_contributors,
)
{
__local char3 localcache ;
for (ushort i=0; i<MAX_CONTRIBUTORS; i++) {
localcache = K_contributors ;
}
...
}
As the loop-index "i" is increased consecutively, I expected larger burst sizes than 2. is there any explanation for this?
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should unroll the loop so that the compiler would infer a wider port to memory, allowing for larger burst size. There is little to no runtime coalescing done for single work-item kernels and hence, you should not expect a large burst size without unrolling, just because the accesses are consecutive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. Why Bandwidth is 0.1MB/s??Is there something wrong with profiler? I also encounter this problem in quartus 17.0
2. I have my kernel code like typedef struct{ float a[20]; }A __kernel foo(__global *A data){ A localdata[100]; for(i=0;i<100;i++){ localdata[i]=data[i+index]; } } I expect every memory access will bust coalescing read global memory for 20 float, so Average Burst Size suppose larger than 1. but in profiler Average Burst Size shows only 4~6. how to increase my access efficiency?
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page