- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Suppose there are 8 workgroups, each workgroup contains 8 work items.
I declare local memory in kernel function. __local float A[1000]; if I copy data from global memory, this kind of behavior will increase M20K RAM block usages? the total M20K is not "local memory size * workgroup number"? A[1000] * 8 for (...) { A[] = data from global memory } ThanksLink Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not all work-groups run fully in parallel on the FPGA. The compiler will decide how many work-groups can run in parallel. The M20K utilization will depend on the number of accesses to the buffer per work-group (which depends on the code and can also be affected by SIMD size), the number of work-groups running in parallel per compute unit (decided by the compiler), and the number of compute units (enforced by the user). The compiler report will explicitly mention why and how many times each local buffer is replicated, and how much the total size will be.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page