- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi all,

I have got a speedup of approximately 4x with just rewriting a few where statements in my loop to explicit do loops as illustrated below.

Unchanged code

where ( LMASK ) WORK1(:,:,kk) = KAPPA_THIC(:,:,kbt,k,bid) & * SLX(:,:,kk,kbt,k,bid) * dz(k) WORK2(:,:,kk) = c2 * dzwr(k) * ( WORK1(:,:,kk) & - KAPPA_THIC(:,:,ktp,k+1,bid) * SLX(:,:,kk,ktp,k+1,bid) & * dz(k+1) ) WORK2_NEXT = c2 * ( & KAPPA_THIC(:,:,ktp,k+1,bid) * SLX(:,:,kk,ktp,k+1,bid) - & KAPPA_THIC(:,:,kbt,k+1,bid) * SLX(:,:,kk,kbt,k+1,bid) ) WORK3(:,:,kk) = KAPPA_THIC(:,:,kbt,k,bid) & * SLY(:,:,kk,kbt,k,bid) * dz(k) WORK4(:,:,kk) = c2 * dzwr(k) * ( WORK3(:,:,kk) & - KAPPA_THIC(:,:,ktp,k+1,bid) * SLY(:,:,kk,ktp,k+1,bid) & * dz(k+1) ) WORK4_NEXT = c2 * ( & KAPPA_THIC(:,:,ktp,k+1,bid) * SLY(:,:,kk,ktp,k+1,bid) - & KAPPA_THIC(:,:,kbt,k+1,bid) * SLY(:,:,kk,kbt,k+1,bid) ) endwhere

Changed code

do j=1,ny_block do i=1,nx_block if ( LMASK(i,j) ) then WORK1(i,j,kk) = KAPPA_THIC(i,j,kbt,k,bid) & * SLX(i,j,kk,kbt,k,bid) * dz(k) WORK2(i,j,kk) = c2 * dzwr(k) * ( WORK1(i,j,kk) & - KAPPA_THIC(i,j,ktp,k+1,bid) * SLX(i,j,kk,ktp,k+1,bid) & * dz(k+1) ) WORK2_NEXT(i,j) = c2 * ( & KAPPA_THIC(i,j,ktp,k+1,bid) * SLX(i,j,kk,ktp,k+1,bid) - & KAPPA_THIC(i,j,kbt,k+1,bid) * SLX(i,j,kk,kbt,k+1,bid) ) WORK3(i,j,kk) = KAPPA_THIC(i,j,kbt,k,bid) & * SLY(i,j,kk,kbt,k,bid) * dz(k) WORK4(i,j,kk) = c2 * dzwr(k) * ( WORK3(i,j,kk) & - KAPPA_THIC(i,j,ktp,k+1,bid) * SLY(i,j,kk,ktp,k+1,bid) & * dz(k+1) ) WORK4_NEXT(i,j) = c2 * ( & KAPPA_THIC(i,j,ktp,k+1,bid) * SLY(i,j,kk,ktp,k+1,bid) - & KAPPA_THIC(i,j,kbt,k+1,bid) * SLY(i,j,kk,kbt,k+1,bid) ) endif enddo enddo

We are unable to try Vtune as the code ran for eternity, like 1 day. Also no info as to why it showed such high speedup was not explained. Just a few red bars with time taken. They were in agreement to the time showed by omp timers added b/w the loops.

The compilation of unchanged code was with -O3 while changed code was with -O2. I can clearly rule out loop fusion as a reason(fusion of loops of work1, work2 which are hidden in : style language). As per the opt-report loops we fused for original code.

I can guarantee no Openmp was implemented. All flags are similar in both cases(except O3).Any explanation why is speedup as high as 4X.

UPDATE

Hi I did check the optrpt and found the following for unchanged and changed code

unchanged code

remark #15448: unmasked aligned unit stride loads: 13

remark #15449: unmasked aligned unit stride stores: 3

remark #15450: unmasked unaligned unit stride loads: 3

remark #15455: masked aligned unit stride stores: 6

remark #15456: masked unaligned unit stride loads: 16

changed code

remark #15448: unmasked aligned unit stride loads: 1

remark #15454: masked aligned unit stride loads: 2

remark #15455: masked aligned unit stride stores: 6

remark #15456: masked unaligned unit stride loads: 16

- Tags:
- Parallel Computing

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

It does look like the fusion of your WHERE version is incomplete, possibly as a consequence of the rank 2 array assignments, or possibly because that style is not so frequently used in critical performance situations which the compiler has been trained to optimize. You must recognize that the syntax of WHERE requires the compiler to start out by distributing the conditional to each individual assignment, so there is a lot more work to be done to get back to full sharing of operands by loop fusion.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

What I suspect to be happening is, in the 4x test case, the array LMASK is sparsely (or at least not densely) populated with .true..

Under this circumstance, the code may have been performing masked load/store operations where the entire mask is .false..

Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page