: I do not see any significant
speedup with varying the number of cores!
The performance of any spare matrix operations is much lower that the dense BLAS because the memory access patterns are irregular and the ration of float point operations is lower than in some dense operations. So thats the reason why you dont see any significant speedup.
So, if the matrix sizes are fit with the RAM, when it would be more efficient to use dense BLAS calculations.
In such cases It may be make a sense to convert from sparse to dense, then use m-v calculation for dense routines--Gennady