Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Using OpenMP to build Matrix

Heo__Jun-Yeong
820 Views

Hi, I'm working on building mapping matrix between 2 meshes

Elements of mapping matrix is just a portion of intersecting volume of the mesh elements.

So, if the elements of mesh1 contain velocity info,  velocity of the elements in mesh2 can be calculated by matrix-vector product 

    v2 = M1*v1 ( also,  v1 = M2*v2)

As the number of elements of the meshes are very large, I want to apply openMP to parallelize the algorithm.

Here is my pseudo code

 

 

module tmp
    use mMesh
    use mIntersect
    use mCSR
    implicit none
    integer :: n1, n2
    type(CSR) :: M1, M2
 contains
  subroutine Set(No1, No2)
    integer, intent(in) :: No1, No2
    
    n1 = No1; n2 = No2
  end subroutine Set

  subroutine Map(mesh1, mesh2)
    type(mesh), intent(in) :: mesh1, mesh2
    integer :: i, j
    integer :: icount1, icount2
    real(8) :: Vol
    type(compCSR) :: comp
    type(listCSR), dimension(n2), target :: heads1
    type(listCSR), dimension(n1), target :: heads2

    !$OMP Parallel default(private) shared(icount1,icount2,heads1,heads2)
    !$OMP DO
    icount1 = 0; icount2 = 0
    do i = 1, n1
      do j = 1, n2
        Vol = IntersectVolume(mesh1%Ele(i), mesh2%Ele(j))
        if (Vol.GT.0.d0) then
          comp%col = i; comp%val = Vol/mesh2%EleVol(j)
          call Insert(comp, heads1(j))
          icount1 = icount1 + 1
          comp%col = j; comp%val = Vol/mesh1%EleVol(i)
          call Insert(comp, heads2(i))
          icount2 = icount2 + 1
        end if
      end do
    end do
    !$OMP END DO
    !$OMP END Parallel

    call MakeCSR(icount1, heads1, M1)
    call MakeCSR(icount2, heads2, M2)
  end subroutine Map
end module tmp

program test
  use tmp
  use mMesh
  implicit none
  character(len=80) :: file1='mesh1.dat'
  character(len=80) :: file2='mesh2.dat'
  type(mesh) :: mesh1, mesh2

  call ReadMesh(file1, mesh1)
  call ReadMesh(file2, mesh2)

  call Set(mesh1%nele, mesh2%nele)
  call Map(mesh1, mesh2)
end program test

 

 

As the mapping matrix is sparse matrix, I used CSR format to store mapping matrix. 

'module mCSR' is the module that I made to use CSR arrays

'heads1, heads2' are array of linked list to build CSR arrays

( from topic https://software.intel.com/en-us/comment/1918497#  (several routines are modified))

 

I want to parallelize the loop 'i' and 'icount1, icount2, heads1, head2' should be shared to make CSR array after the loop.

But the above OMP method didn't work...

Is there any idea to make that possible...?

Labels (1)
0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
807 Views

Sharing a variable in a parallel region that each thread performs (as an example)

icount1 = icount1 + 1

exhibits a race condition amongst threads (you may lose counts). To correct for counting-like race conditions, use the OpenMP reduction clause.

The $OMP DO must immediately precede the Fortran DO (place icount's initialiation before the $OMP parallel .AND. use the reduction clause on those variables (I will let you read up on and investigate reduction as an exercize).

An additional issue you have here, more subtle, is

While the second Insert using heads2(i) will be thread-safe, due to each thread having different i index, the first Insert using heads1(j) will have potential collisions with different threads using the same j.

I will let you think about how to fix that (be creative).

Jim Dempsey

View solution in original post

0 Kudos
1 Reply
jimdempseyatthecove
Honored Contributor III
808 Views

Sharing a variable in a parallel region that each thread performs (as an example)

icount1 = icount1 + 1

exhibits a race condition amongst threads (you may lose counts). To correct for counting-like race conditions, use the OpenMP reduction clause.

The $OMP DO must immediately precede the Fortran DO (place icount's initialiation before the $OMP parallel .AND. use the reduction clause on those variables (I will let you read up on and investigate reduction as an exercize).

An additional issue you have here, more subtle, is

While the second Insert using heads2(i) will be thread-safe, due to each thread having different i index, the first Insert using heads1(j) will have potential collisions with different threads using the same j.

I will let you think about how to fix that (be creative).

Jim Dempsey

0 Kudos
Reply