Showing results for 
Search instead for 
Did you mean: 

Fully pipelined sparse matrix-vector multiplication(SMVM)

Dear all,


I am trying to implement an application as efficient as possible as a single work-item kernel. And I found out that, my application is very same with SMVM. In this application we have double for loop, outer loop iterates rowCount times, and inner loop iterates #ofNonzeroElementsInRow times. However in this structure, compiler cannot pipeline the structure because of "Out-of-Order Loop Iterations" below:

The kernel is compiled for single work-item execution.   Loop Report:   + Loop "Block1" (file line 34) | NOT pipelined due to: | Loop exit condition unresolvable at iteration initiation. | Simplify loop exit condition to fix this problem. | See "Unable to Resolve Loop Exit Condition at Iteration Initiation" section of the Best Practices Guide for more information. | Not pipelining this loop will most likely lead to poor performance. | | |-+ Loop "Block2" (file line 43) Pipelined well. Successive iterations are launched every cycle.

I searched on the forum for such problems and I found this question. In this article, It is said that, using all the elements with a condition during the iteration of the outer loop to make number of iterations constant. However this yields huge performance loss because of empty cycles.


I thought that even some applications are not easy to solve, well-known application like SMVM should be implemented in the most efficient way.


I couldn't find any pointer to this problem and implementation of SMVM on the internet. My question is, is there any "most-efficient" implementation of this application? Or can "completely pipelining a loop structure with variable number of iteration" be done with some trick or so?



Thank you in advance,

Kaan Akyol

0 Kudos
0 Replies