[fortran]!Parallel matrix-vector multiplication, row packed matrix !$OMP PARALLEL DEFAULT(SHARED) NUM_THREADS(4) !$OMP DO SCHEDULE(DYNAMIC) PRIVATE(i,j,itn,Sum) do i=1,n Sum = 0. itn = (i-1)*n do j=1,n Sum = Sum + A(itn+j)*x(j) end do y(i) = Sum end do !$OMP END DO NOWAIT !$OMP END PARALLEL[/fortran]
[bash] Parallel Algebra Performance Test 1400x1400 Matrix multiplied with a vector 100000 times Serial row packed matrix-vector multiplication took 163.55469 seconds to complete. Parallel row packed matrix-vector multiplication took 24.86914 seconds to complete. IMKL parallel matrix-vector multiplication took 59.45898 seconds to complete. [/bash]
I think the intel fortran compiler
automatically aligns arrays whenever possible. But since I'm not
entirely sure I tried manually aligning the array with the directive,
!DEC$ ATTRIBUTES ALIGN
but after having tried all possible alignment boundaries without any difference in performance I'm pretty sure it's not an alignment issue.
Thanks for the suggestion anyway.
I'm nearly certain it's a cache issue by now as I see a dramatic drop in performance at the point when the array doesn't fit into the L2 cache anymore.