- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to incorporate the kernel vectorization optimization I get the following compiler warning Compiler Warning: Kernel is vectorized but there exist loads/stores that cannot be vectorized. This may reduce performance. The following are the details: Global thread dimension: 240 x 540 Local Work grp dimension 240 x 1 Input dimension 1920 x 1080 I used following attributes _attribute__((num_simd_work_items(4))) __attribute__((reqd_work_group_size(240,1,1))) input loading code snippet : for(UInt32 i = 0 ; i < 8; i++) { tempin[lidx + i * 240] = input[lidx + i * 240]; } where lidx: local_work_id in x direction with max val as 239 (since Local wrg grp dim 240 x 1) tempin is a local memory buffer which is used for per workgrp computation Can anyone suggest way to avoid this warning.............? Let me know if I have to furnish any more details ...... Thanks NeelakandanLink Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
reqd_work_group_size might be set to be the power of 2, 240 is not normal.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Even after specifying the required work group size as a power of 2 (Instead of 240 I specified it as 256), I get the same warning message... Can there be any other reason ? Thanks- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is about "lidx + i * 240", the AOC cannot analyze them effectively, which leads to suboptimal performance.
You may try the "#prama unroll" before the for loop.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page