Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)

Kernel Vectorization

Altera_Forum
Honored Contributor II
1,755 Views

hi 

 

========================================================== 

void tempA( ...) {...}; 

void tempB( ...) {...}; 

void processing(global int *a){ 

if(a == 0) 

tempA( a ); 

else 

tempB( a ); 

 

__attribute__((num_simd_work_items(2))) 

__attribute__((reqd_work_group_size(256,1,1))) 

kernel void test (__global int * a ) // NDR , globalsize = a /2 , initial a[ 0~N ] = 1 

int gid = get_gloabla_gid(0); 

 

for(int i = 0 ; i < 2 ; i++){ 

while(a[gid + i] == 0) 

processing(&a[gid + i]);  

=========================================================== 

The code I wrote above is the thing I was trying . 

It showed that "Compiler Warning: Kernel Vectorization: branching is thread ID dependent ... cannot vectorize." 

How to solve or explain this situation ? 

 

And while loop with unpredicted end condition is not friendly for vectorization and very inefficent , right ? 

Thanks.
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
642 Views

It means that one of your branches is thread ID dependent. So the follow section 

 

while(a == 0) processing(&a);  

 

is thread-id dependent. Best practices guide states to avoid work-item dependent backwards branching.
0 Kudos
Altera_Forum
Honored Contributor II
642 Views

Thanks okebz , 

 

So , if my write as follows , is it the same things ? 

=========================================== 

void tempA( ...) {...}; 

void tempB( ...) {...}; 

void processing(global int *a , int *b){ 

if(a == 0) 

tempA( a ,b); 

else 

tempB( a ,b); 

 

__attribute__((num_simd_work_items(2))) 

__attribute__((reqd_work_group_size(256,1,1))) 

kernel void test (__global int * a ) // NDR  

int gid = get_gloabla_gid(0); 

int b ; 

while ( b ==0 ) 

processing(&a[gid] , &b );  

 

================================= 

 

But if my program flow is as previously said , how to optimize this code ? 

Each workitem stays in while loop until condition is matched. 

Is it better to use task instead of NDR ? 

 

Regards .,
0 Kudos
Altera_Forum
Honored Contributor II
642 Views

As long as b is not dependent on the work-item ID. Yes, depending on what you're trying to do, it seems like a single task would be better. If your problem data set cannot be divided into independent sections and depends on other work items, then a single work-item kernel might be a good choice.

0 Kudos
Reply