- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Currently I am doing some experiments with matrix-XOR kernel (similar with altera matrix-multiplication example, just change the multiplication operation to bit-wise exclusive-or). In the code the loop is fully unrolled. I find the work-group size setting has a tremendous affect on logic utilization report. For example, if the work-group size is set as (64, 64, 1), the logic utilization shown in report is 16%. And when the work group size is (128,128,1), the logic utilization will be 46% which is easy to understand since more bit-wise exclusive-or operations are done in the fully unrolled loop. However when I change the work group size to (80,80,1), the logic utilization will be increase to 123%, which I cannot understand. Can anyone give some suggestions or recommendations about this phenomenon? Does it mean the compiler prefer work-group size value as power of 2? Thanks.Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My guess is that the optimizer fails to do a good job with the size of (80, 80). Can the problem possibly be simplified for powers of two? Have you tried to implement the problem as a single work item kernel? Those tend to be more efficient and the compiler is more predictable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- My guess is that the optimizer fails to do a good job with the size of (80, 80). Can the problem possibly be simplified for powers of two? Have you tried to implement the problem as a single work item kernel? Those tend to be more efficient and the compiler is more predictable. --- Quote End --- Thanks for the reply. Actually I want to know if it is OK to construct a local memory (has the same size with work group) whose size is not powers of two. E.g when setting the SIMD as 8 for matrix XOR kernel, a 128 * 128 local memory per work group will use more than 100% memory blocks on FPGA. So I want to know if it is possible to use a 80 * 80 local memory while maintaining SIMD as 8 to utilize more memory blocks on FPGA (but less than 100%)

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page