Fully pipelined sparse matrix-vector multiplication(SMVM)

KAkyo · ‎05-13-2019

Dear all,

I am trying to implement an application as efficient as possible as a single work-item kernel. And I found out that, my application is very same with SMVM. In this application we have double for loop, outer loop iterates rowCount times, and inner loop iterates #ofNonzeroElementsInRow times. However in this structure, compiler cannot pipeline the structure because of "Out-of-Order Loop Iterations" below:

The kernel is compiled for single work-item execution.
 
Loop Report:
 
 + Loop "Block1" (file compute_pagerank_single.cl line 34)
 | NOT pipelined due to: 
 |   Loop exit condition unresolvable at iteration initiation.
 |   Simplify loop exit condition to fix this problem.
 |   See "Unable to Resolve Loop Exit Condition at Iteration Initiation" section of the Best Practices Guide for more information.
 |   Not pipelining this loop will most likely lead to poor performance.
 | 
 | 
 |-+ Loop "Block2" (file compute_pagerank_single.cl line 43)
     Pipelined well. Successive iterations are launched every cycle.

I searched on the forum for such problems and I found this question. In this article, It is said that, using all the elements with a condition during the iteration of the outer loop to make number of iterations constant. However this yields huge performance loss because of empty cycles.

I thought that even some applications are not easy to solve, well-known application like SMVM should be implemented in the most efficient way.

I couldn't find any pointer to this problem and implementation of SMVM on the internet. My question is, is there any "most-efficient" implementation of this application? Or can "completely pipelining a loop structure with variable number of iteration" be done with some trick or so?

Regards,

Thank you in advance,

Kaan Akyol