How to measure the overhead to ensure the parallelization is worth to perform on existing code?

rudaho · ‎07-08-2010

Hello~

I used OpenMP to paralleize my sparse matrix solver. The concept is to parallelize each column operation. However, not all matrix have good performance improvement. In fact, only one matrix which is larger than 25000x25000 obtains improvement. I got even much worse (twice or 3 times longer) performane for most of the sparse matrix. I've tried dense matrices, and the performance get better then.

I guess this is because the operations for each column is few since the matrix is sparse. Is there any suggestion or guides about how to measure the overhead on parallelizing an existing code. Such that I can know that the existing code for specific application is not worth to parallelize. Thanks...

Best Regards

Yi-Ju

Dmitry_Vyukov · ‎07-08-2010

Quoting rudaho

Is there any suggestion or guides about how to measure the overhead on parallelizing an existing code. Such that I can know that the existing code for specific application is not worth to parallelize. Thanks...

The simplest and most reliable (and probably the fastest) method is to parallelize the code and see the results.

jimdempseyatthecove · ‎07-10-2010

I might add to Dmitriy's comment that the performance changeobserved your first attempt at parallelization should not be taken as a measure of what your second attempt at parallelization will produce.

Jim

kalloyd · ‎07-10-2010

Quoting jimdempseyatthecove

I might add to Dmitriy's comment that the performance changeobserved your first attempt at parallelization should not be taken as a measure of what your second attempt at parallelization will produce.

Jim

Jim,

Truer words were never spoken. If I may add, many subsequent attempts may yield improvement, no improvement, or even decrease in performance. However, by going through these processes, you will certainly learn a lot about performance and parallelization.

There are geometries (topologies)regarding combinations ofparallel and serial execution graphs.

Ken

jimdempseyatthecove · ‎07-10-2010

>>There are geometries (topologies)regarding combinations ofparallel and serial execution graphs.

The underlaying algorithms have a lot to deal with performance too. I will be addressing this in my upcomming ISN Blogs posting. Good algorithms, for tough problems take into consideration how the cache(s) are distributed about the system.

Jim Dempsey