- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I need little help on my fortran code.
Code itself is very simple, but loops take long time.
I'll be greatfull if anybody can give some ideas to reorganize llops to make it faster.
I use intel compiler, but coding is mostly F77-style.
Below is subroutine which is very slow. I am realy interested if it is possible to do it faster by changing order of loops or any other types of loops, capabilities of intel fortran,...
do i=1,nth1/2
th0=th(i)
i2=nth1-i+1
call do_cur_plm(th0,lmax1,dummy_plm)
do l=0,lmax1
is=1
if(real(l/2.)-int(l/2).ne.0.) is=-1
do m=0,l
m2=m+1
th1=dummy_plm(l,m)
th2=th1*is
is=-is
do min=0,59
k=min+1
z1=ft_images(k,i,m2)*th1
z2=ft_images(k,i2,m2)*th2
ss(min,l,m)=ss(min,l,m)+z1+z2
enddo
enddo
enddo
enddo
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might get an improvement in cache behavior if you would arrange arrays so that the next to innermost loop doesn't increment the last subscript, but you don't give enough information to show that.
Your scheme for initializing is alternately to 1 or -1 is too complicated, but you don't show enough to guess whether that is a problem. Even if you simplify it, I don't see that you could permit the compiler to optimize by swapping l and m loops.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Shukur,
When you check the code (Dissassembly Window) is the do min=0,59 vectorized? As well as unrolled to some degree?
If both vectorization as well as some level of unrolling is not (completely) present then try helping the compiler out by removing the temporary variables.
do min=0,59
ss(min,l,m)= ss(min,l,m) &
& + ft_images(min+1,i,m2)*th1 &
& + ft_images(min+1,i2,m2)*th2
enddo
Granted, the compiler optimization should be able to figure this out for you but if you use k, z1, z2 outside the loop (e.g. for later loop) then the compiler might not optimize the use of the temporaries out of the generated code
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After changes you mentioned it is about 5-10% faster.
I am using
-g -w -c -O4
options to compile. Are there other keys might help?
Sorry for stupid question, I am realy beginer.
Thank you in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Shukur,
There are no stupid questions - only stupid answers...
Now for your next lesson in optimization. Experiment with the following
! insert this in your main code
! up where you declare variables
interface
subroutine do_min(ss, ft_images1, th1, ft_images2, th2)
real :: ss(0:59), ft_images(0:59), ft_images(0:59)
real :: th1, th2
end subroutine do_min
end interface
...
! replace do min 0,59 loop with
call do_min(ss(0:59,l,m), ft_images(1:60,i,m2), th1, ft_images(1:60,i2,m2), th2)
...
! create new subroutine (stick at bottom of source file with main code)
subroutine do_min(ss, ft_images1, th1, ft_images2, th2)
real :: ss(0:59), ft_images(0:59), ft_images(0:59)
real :: th1, th2
do min=0,59
ss(min)= ss(min,l,m) &
& + ft_images(min)*th1 &
& + ft_images(min)*th2
enddo
end subroutine do_min
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page