- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The following code is used to calculate y = A * x as the function

**mkl_?coogemv**:

do i=1,n ! matrix A is the size of n*n

y(i)=0.0d0

enddo

do i=1,nnz ! nnz is the number of entries of A

ii = irow(i) ! irow is the row number index of entries

jj = jcol(i) ! jcol is the column number index of entries

y(ii) = y(ii) + A(i)*x(jj)

enddo

Could anyone give me some suggestion about how to parallelize the code by OpenMP?

Thanks,

Zhanghong Tang

Link Copied

3 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

do i=1,n ! matrix A is the size of n*n

y(i)=0.0d0

enddo

do i=1,nnz ! nnz is the number of entries of A

ii = irow(i) ! irow is the row number index of entries

jj = jcol(i) ! jcol is the column number index of entries

y(ii) = y(ii) + A(i)*x(jj)

if(issymmetric.and.ii/=jj)y(jj)=y(jj)+A(i)*x(ii)

enddo

How to parallelize the code by OpenMP?

Thanks,

Zhanghong Tang

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I suppose it's likely you might use such a scheme for packed storage of a symmetric matrix, where the submatrices associated with various values of i don't overlap. Then, you might start out with

!omp$ do parallel private(ii,jj) schedule(guided)

do i=1,nnz

...

You would likely find that it doesn't scale to all cores on a NUMA machine such as Nehalem or Opteron. Assuming that you keep the storage in a sane order (no skipping around with irow and jcol), you could throw on a static scheduled outer loop over the number of available threads, where you determine which range of i should be processed by each thread in order to balance the work.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks for your kindly reply. I know that the sparse matrices are stored by CSR format and then the calculation was parallelized. However, since the MKL function

**mkl_?coogemv**is parallelized, I am wondering how does MKL parallelize this function.

Thanks,

Zhanghong Tang

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page