Re: Kernel Vectorization

Altera_Forum · ‎07-07-2015

hi

==========================================================

void tempA( ...) {...};

void tempB( ...) {...};

void processing(global int *a){

if(a == 0)

tempA( a );

else

tempB( a );

}

__attribute__((num_simd_work_items(2)))

__attribute__((reqd_work_group_size(256,1,1)))

kernel void test (__global int * a ) // NDR , globalsize = a /2 , initial a[ 0~N ] = 1

{

int gid = get_gloabla_gid(0);

for(int i = 0 ; i < 2 ; i++){

while(a[gid + i] == 0)

processing(&a[gid + i]);

}

===========================================================

The code I wrote above is the thing I was trying .

It showed that "Compiler Warning: Kernel Vectorization: branching is thread ID dependent ... cannot vectorize."

How to solve or explain this situation ?

And while loop with unpredicted end condition is not friendly for vectorization and very inefficent , right ?

Thanks.

Altera_Forum · ‎07-08-2015

It means that one of your branches is thread ID dependent. So the follow section

while(a == 0)
  processing(&a);

is thread-id dependent. Best practices guide states to avoid work-item dependent backwards branching.

Altera_Forum · ‎07-08-2015

Thanks okebz ,

So , if my write as follows , is it the same things ?

===========================================

void tempA( ...) {...};

void tempB( ...) {...};

void processing(global int *a , int *b){

if(a == 0)

tempA( a ,b);

else

tempB( a ,b);

}

__attribute__((num_simd_work_items(2)))

__attribute__((reqd_work_group_size(256,1,1)))

kernel void test (__global int * a ) // NDR

{

int gid = get_gloabla_gid(0);

int b ;

while ( b ==0 )

processing(&a[gid] , &b );

}

=================================

But if my program flow is as previously said , how to optimize this code ?

Each workitem stays in while loop until condition is matched.

Is it better to use task instead of NDR ?

Regards .,

Altera_Forum · ‎07-08-2015

As long as b is not dependent on the work-item ID. Yes, depending on what you're trying to do, it seems like a single task would be better. If your problem data set cannot be divided into independent sections and depends on other work items, then a single work-item kernel might be a good choice.