- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everybody,
I am trying to reduce the execution time of an existing program for a 2D flow simulation, which was originally written in Fortran 77. Since i am quite new to Fortran programming, at first i tried to gather some information about efficient coding. Many times i was recommended to make use of the "new" intrinsic functions, since this way the compiler would be told whats going on and could perform some optimizations (e.g. vectorization). Although, these functions and the use of vector expressions instead of scalar ones made my code much more compact and elegant, the computation run time increased to my surprise.
These are some code snippets in the old and new version, and the respective required cpu time (3 runs each):
===========================================================================
1. Example CSHIFT: performs a shift of the elements in a matrix by one element along all directions (including diagonals)
===========================================================================
old: t = 8.777; 8.553; 8.789 s
-----------------------------------
do j=1,nj
do i=1,ni
ie = mod(i,ni) + 1
iw = ni - mod(ni+1-i,ni)
jn = mod(j,nj) + 1
js = nj - mod(nj+1-j,nj)
fn(ie,j ,1) = f(i,j,1)
fn(i ,jn,2) = f(i,j,2)
fn(iw,j ,3) = f(i,j,3)
fn(i ,js,4) = f(i,j,4)
fn(ie,jn,5) = f(i,j,5)
fn(iw,jn,6) = f(i,j,6)
fn(iw,js,7) = f(i,j,7)
fn(ie,js,8) = f(i,j,8)
fn(i ,j ,0) = f(i,j,0)
enddo
enddo
----------------------------------------
new: t = 11.009; 11.241; 11,033 s
-----------------------------------------
fn(:,:,0) = f(:,:,0)
fn(:,:,1) = cshift(f(:,:,1),-1,1)
fn(:,:,2) = cshift(f(:,:,2),-1,2)
fn(:,:,3) = cshift(f(:,:,3),1,1)
fn(:,:,4) = cshift(f(:,:,4),1,2)
fn(:,:,5) = cshift(cshift(f(:,:,5),-1,1),-1,2)
fn(:,:,6) = cshift(cshift(f(:,:,6),1,1),-1,2)
fn(:,:,7) = cshift(cshift(f(:,:,7),1,1),1,2)
fn(:,:,8) = cshift(cshift(f(:,:,8),-1,1),1,2)
=====================================================
2. Example WHERE: assigns new values to a matrix where condition (obst) is fulfilled
=====================================================
old: t = 2.488; 2.460; 2.712 s
-----------------------------------
do j=1,nj
do i=1,ni
if (obst(i,j)) then
f(i,j,1) = fn(i,j,3)
f(i,j,2) = fn(i,j,4)
f(i,j,3) = fn(i,j,1)
f(i,j,4) = fn(i,j,2)
f(i,j,5) = fn(i,j,7)
f(i,j,6) = fn(i,j,8)
f(i,j,7) = fn(i,j,5)
f(i,j,8) = fn(i,j,6)
f(i,j,0) = fn(i,j,0)
endif
enddo
enddo
------------------------------------
new: t = 5.404; 5.628; 5.456 s
------------------------------------
where(obst)
f(:,:,1) = fn(:,:,3)
f(:,:,2) = fn(:,:,4)
f(:,:,3) = fn(:,:,1)
f(:,:,4) = fn(:,:,2)
f(:,:,5) = fn(:,:,7)
f(:,:,6) = fn(:,:,8)
f(:,:,7) = fn(:,:,5)
f(:,:,8) = fn(:,:,6)
f(:,:,0) = fn(:,:,0)
endwhere
=====================
3. Example WHERE,SUM,various:
=====================
old: t = 6.020; 6.056; 6.160 s
-----------------------------------
do j=1,nj
do i=1,ni
if(.not.obst(i,j))then
rho(i,j) = fn(i,j,0)+fn(i,j,1)+fn(i,j,2)+fn(i,j,3)+fn(i,j,4)+fn(i,j,5)+fn(i,j,6)+fn(i,j,7)+fn(i,j,8)
u(i,j) = (fn(i,j,1)+fn(i,j,5)+fn(i,j,8)-fn(i,j,6)-fn(i,j,3)-fn(i,j,7))/rho(i,j)
v(i,j) = (fn(i,j,5)+fn(i,j,2)+fn(i,j,6)-fn(i,j,7)-fn(i,j,4)-fn(i,j,8))/rho(i,j)
else
rho(i,j) = rho_in
u(i,j) = 0.d0
v(i,j) = 0.d0
endif
enddo
enddo
-----------------------------------------
new: t = 12.421; 11.897; 11.645 s
-----------------------------------------
where (.not.obst)
rho(:,:) = sum(fn,3)
u(:,:) = (fn(:,:,1)+fn(:,:,5)+fn(:,:,8)-fn(:,:,6)-fn(:,:,3)-fn(:,:,7))/rho
v(:,:) = (fn(:,:,5)+fn(:,:,2)+fn(:,:,6)-fn(:,:,7)-fn(:,:,4)-fn(:,:,8))/rho
elsewhere
rho = rho_in
u(:,:) = 0.d0
v(:,:) = 0.d0
endwhere
-------------------------------------------
I did not expect a significant performance boost, but also not a drop. Has anybody made similar observations or can explain the results? Any help would be greatly appreciated.
With best regards,
Eric
My setup:
- OS: Kubuntu 11.10
- Compiler Version: Fortran Intel(R) 64 Compiler XE, Version 12.1.5.339
- Compilation flags: none
- CPU: Intel(R) Pentium(R) Dual CPU E2180 @ 2.00GHz (cache size: 1024 KB)
- Memory: 2 GB
- Time measurement: using subroutine CPU_TIME, Code is beeing looped 20000 times (variables are changing every loop)
Used variables:
real*8, dimension(1:100,1:100,0:8) :: f, fn;
logical, dimension(100,100,9) :: obst (10% of the elements are true, arranged as a sphere)
real*8, dimension(100) :: ni, nj, u, v, rho
real*8 :: rho_in
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
The semantics of the "old" and "new" code are not the same, and some of your cases, such as the nested cshift, create array temporaries. Using WHERE will create a temporary for the mask. You're right that sometimes the newer usages may be more compact and readable but have hidden performance issues. Generally they work well, but CSHIFT and WHERE are probably not as highly optimized as some other aspects of the language.
I would also discourage you from using (:,:) to mean a whole array. Most of the time this is harmless, but it can sometimes cause unwanted side effects.
This is somewhat disappointing, but the WHERE/ENDWHERE and WHERE/ENDWHERE is easy enough compared to WHERE/ELSEWHERE/ENDWHARE.
The (:,:) and (:) <...etc>, I always found to be much easier to have readable code and know that the array was an array and the rank.
Never the less the information is useful.
Thanks, RH
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
holmz wrote:I submitted a problem report on a case where 15.0 beta compiler loses optimization of where(condition)/where(.not. condition). None of the compilers optimize that case with elsewhere. Other such cases continue to perform well. Over the range of my test cases, where varies from slightly better performance than f77 code down to 50% of performance of f77 with vector directives added (possibly due to lack of similar directives for where). I don't entirely understand the effort to obsolete where, but it makes me wonder if there is a consensus against attempting to optimize elsewhere. In comparison with cshift, all my Xeon examples show a way to get at least double the performance, sometimes requiring use of legacy ifort directives. It seems that peeling ought to be able to achieve full performance for cases with a fixed small shift, so it may be a case of not wanting to devote resources to syntax which doesn't have a C counterpart. I've seen references to something similar in cilk, but Intel(r) Cilk(tm) Plus doesn't appear to work with it, even without optimization. Itanium seemed better suited for cshift but compilers didn't begin to optimize it until the platform was already doomed.
Quote:
This is somewhat disappointing, but the WHERE/ENDWHERE and WHERE/ENDWHERE is easy enough compared to WHERE/ELSEWHERE/ENDWHARE.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page