Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

How to parallelize the following code by OpenMP?

Zhanghong_T_
Novice
405 Views
Dear all,

The following code is used to calculate y = A * x as the function mkl_?coogemv:

do i=1,n ! matrix A is the size of n*n
y(i)=0.0d0
enddo
do i=1,nnz ! nnz is the number of entries of A
ii = irow(i) ! irow is the row number index of entries
jj = jcol(i) ! jcol is the column number index of entries
y(ii) = y(ii) + A(i)*x(jj)
enddo

Could anyone give me some suggestion about how to parallelize the code by OpenMP?


Thanks,
Zhanghong Tang
0 Kudos
3 Replies
Zhanghong_T_
Novice
405 Views
Furthermore, if the matrix is symmetric, additional operations are needed. The sequential version of code is as follows:

do i=1,n ! matrix A is the size of n*n
y(i)=0.0d0
enddo
do i=1,nnz ! nnz is the number of entries of A
ii = irow(i) ! irow is the row number index of entries
jj = jcol(i) ! jcol is the column number index of entries
y(ii) = y(ii) + A(i)*x(jj)
if(issymmetric.and.ii/=jj)y(jj)=y(jj)+A(i)*x(ii)
enddo

How to parallelize the code by OpenMP?

Thanks,
Zhanghong Tang
0 Kudos
TimP
Honored Contributor III
405 Views
Your question is too general to give a definitive answer.
I suppose it's likely you might use such a scheme for packed storage of a symmetric matrix, where the submatrices associated with various values of i don't overlap. Then, you might start out with
!omp$ do parallel private(ii,jj) schedule(guided)
do i=1,nnz
...
You would likely find that it doesn't scale to all cores on a NUMA machine such as Nehalem or Opteron. Assuming that you keep the storage in a sane order (no skipping around with irow and jcol), you could throw on a static scheduled outer loop over the number of available threads, where you determine which range of i should be processed by each thread in order to balance the work.
0 Kudos
Zhanghong_T_
Novice
405 Views
Hi Tim,

Thanks for your kindly reply. I know that the sparse matrices are stored by CSR format and then the calculation was parallelized. However, since the MKL function mkl_?coogemv is parallelized, I am wondering how does MKL parallelize this function.


Thanks,
Zhanghong Tang

0 Kudos
Reply