- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
do k=1,nz do j=1,ny do i=1,nx A(i,j,k)=A(i,j,k)*B(i,j,k)*0.5 enddo enddo enddo
#2
do ijk=1,nx*ny*nz A(ijk,1,1)=A(ijk,1,1)*B(ijk,1,1)*0.5 enddo
#3
A(:,:,:)=A(:,:,:)*B(:,:,:)*0.5
#4
A=A*B*0.5
- Tags:
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#4 "should" be best and equivalent to #1
To improve performance you would want to align the arrays (A and B) to cache line address
!DIR$ ATTRIBUTES ALIGN: 64:: A, B
#2 won't work if array bounds checking is enabled. If array dimensions for aligned allocations do not produce rows (1st index) of multiples of cache line, for the example above #1 can have the loops collapsed (see IVF index)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
At full optimization, ifort would perform an optimization equivalent to #2 if possible. #3 is prone to generation of temporary arrays. ifort optimization reports should show when either of those happen.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW I use a similar technique to #2. In a module I have:
subroutine ArrayDivide6x6(a,d) implicit none real(i8), intent(inout) :: a(6*6) real(i8), intent(in) :: d a = a / d end subroutine ArrayDivide6x6 ... REAL(i8),DIMENSION(6,6), INTENT(OUT) :: qq_Transpose ... call ArrayDivide6x6(qq_Transpose, exp(xsi*z))
You should be able to use a similar technique with your example code.
YMMV
I noticed in IVF V17u1 an optimization issue with
qq_Transpose = qq_Transpose / exp(xsi*z)
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page