- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am wondering if in the following DO-loop, the introduction of the new INTEGER variable upper_index in place of the two identical calculations x_index(i,l) + N will have a benefit to the performance? I have already timed this loop and compared it to the original loop, and the runtime ratio indicates ~1.40 performance gain.
DO i = 1,N upper_index = x_index(i,l)+N coef(i) = 0.5 * vz(x_index(i,l),l) g2(i) = 2. * ( a(upper_index,k,l) - a(x_index(i,l),k,l) ) / ( dz(x_index(i,l)) * (1.0 + az1(x_index(i,l)) ) ) IF (s(x_index(i,l),l) <= p1 .OR. s(upper_index,l) <= p1) g2(i)=0.0 END DO
This loop is nested in loops over K and L
However, when I tested the same idea in the following simple loop, I did not see meaningful difference in the speed of the code.
do i = 1,nloop1 do j = 1,nloop2 k = i+j variable(k) = k end do end do
Which one of the above two observations I can possibly trust?
Also, I noticed that the use of IF-ELSEIF construct is noticeably less efficient than using multiple IF statements in a DO-loop. Is this observation correct? I am using Intel Fortran compiler 2015, with O2 optimization flag.
Thanks you in advance,
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's not unusual, in my experience, that a compiler can benefit from your help in simplifying any but the simplest loops.
Particularly in cases of auto-vectorization, replacing
if(condition)then
...
else
...
by
if(condition)..
if(.not. condition)
.....
may be helpful (assuming that you don't insist on short-cut evaluation but are willing to have multiple cases evaluated speculatively). I wouldn't normally go so far as to replace an if block by multiple ifs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The second loop does a lot of repeated assignments to the same locations. It could be replaced by
do i=2,nloop1+nloop2 variable(i)=i end do
For nloop1 = nloop2 = 100000, the first version of the loop took 4.7 s of CPU time with an i7-2720. The second version took less time than the resolution of the cpu_time() function (the compiler could have even done away with the loop -- I did not check).
Perhaps you could come up with more realistic sections of an algorithm that are more worthy of optimization. Secondly, compilers are so adept at optimizing these days that one should think twice before trying to help them with loop invariants by doing manual code rewrites.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Tim & mecej4. I think I was not clear enough in my question, and so it created some confusion. The question was to compare the performance of this code:
call cpu_time(tstart) do i = 1,nloop1 do j = 1,nloop2 variable(i+j) = i+j end do end do call cpu_time(tend) write(*,*) tend-tstart
with the following,
call cpu_time(tstart) do i = 1,nloop1 do j = 1,nloop2 k = i+j variable(k) = k end do end do call cpu_time(tend) write(*,*) tend-tstart
The only difference between the two is the assignment k = i + j in place of i + j wherever it appears in the loop. So, even though both loops do repeated assignments to the same locations, they should only differ in performance where K is used in place of I+J. I did this simple test to see if this change can have any benefits to performance, in more complicated codes like the very first code snippet that I posted above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In your latest examples, the value of `variable` is never used. If the compiler is clever enough (and often with these sorts of examples ... it is) it will recognise that and then decide that it doesn't actually need to execute the assignment to `variable`. It may then realize that the loops aren't doing anything useful, and with progressive analysis replace them with simple assignments to i, j [and k]. It may then recognize that i, j and k aren't being used, and get rid of them too. All up, your code could end up being...
call cpu_time(tstart) call cpu_time(tend) write (*,*) tend-tstart
which is possibly not all that interesting to test.
As stated elsewhere - different code will result in different optimization outcomes. From an outsiders point of view it is very difficult to predict what the compilers optimizer will do, beyond simple cases, and what it does may change with version and other compile options. So if a bit of code is that important that you care deeply that it be optimized most effectively, you need to measure that specific code, in a manner that tests that specific code in a realistic fashion. In your opening post you say you've done that measurement... so you know the answer to your question.
But I would be getting pretty desperate to be embarking on hand optimization like this. As a very good general rule, write the code that is easiest to write/easiest to understand/easiest to maintain, and make optimization a problem for the optimizer. Is your code easier to write/understand/maintain with the intermediate variable? That's your call.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should look again at mecej4's comment as his approach does reduce the number of loop trips in your second loop by an order of N.
I am suggesting a change for your first loop, only for the reason to make it a bit clearer to read. (most/all optimising compilers would do this anyway.) I also changed the IF test so that g2 is only calculated when the test is false. Without knowing the probability of the IF test being true, I don't know if this is a significant improvement or may hinder auto-vectorisation.
DO i = 1,N ix = x_index(i,l) coef(i) = 0.5 * vz(ix,l) IF (s(ix, l) <= p1 .OR. & s(ix+N,l) <= p1) then g2(i) = 0.0 else g2(i) = 2.0 * ( a(ix+n,k,l) - a(ix,k,l) ) & / ( dz(ix) * (1.0 + az1(ix) ) ) end if END DO
I am not sure if Tim's comment on auto-vectorisation applies, as I am not sure that multiple calculations of g2 could be vectorised, given that ix can vary for each cycle of DO I. If the g2 calculation can be vectorised and there is a low probability that the IF test is true, then any coding that assists the vectorisation would help.
John

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page