Source optimization question

Dogbite · ‎03-22-2011

I am currently going through the periodic (4+ years) process of updating program documentation. I maintain the program, including the integration of code written by a few others. During integration, I'm always rushed to make sure the code works, and it's not until the documentation phase thatI have the opportunity to really examine the new code.

So I find that the other prog'r threw in three statement functions in one of our costing routines. Here's one as an example:

EQUATION310(X,Y,I) =

& P(I) + ! A

& P(I + 1) * X + ! B

& P(I + 2) * Y + ! C

& P(I + 3) * X * X + ! D

& P(I + 4) * Y * Y + ! E

& P(I + 5) * X * Y + ! F

& P(I + 6) * X * X * X + ! G

& P(I + 7) * Y * Y * Y + ! H

& P(I + 8) * X * Y * Y + ! I

& P(I + 9) * X * X * Y ! J

Of course, "P" is a table of values, initialized via a BLOCK routine,indexed by "I". Here is how the statement is called:

SELECT CASE (IVT)

CASE (1)

CSFC = MAX(0.0,EQUATION310(AVS,GRADEB,POFFSET(IVT)))

So I'm wondering how efficientl the compiler-generated code will be. Will it recognize that IVT will have the value of 1 andgenerate instructions referencing that specific cell in POFFSET? Or do I need to hard code the value found in POFFSET(1) into the call line? Will that really save execution time?

The extreme case would seem to be to hard-code the equation, including the values from POFFSET, instead of calling the statement function. Is that likely to cut down execution time?

My top-o-the-head estimate is that these statement functions (the others are very similar) are invoked some 58 million times during a standardprogram run, and maybe 250 million times in an extended scenario. OMP got the extended runs down to about an hour, but I'm still looking to cut more if I can.

Thanks,

Greg

Steven_L_Intel1 · ‎03-22-2011

I would be quite astonished if the compiler noticed that you were selecting on IVT, that there was only one value in the CASE, and that IVT was used as an index. But one never knows.

My advice here is to write what is clearest for someone reading the code and not to worry about such nano-optimizations.

TimP · ‎03-22-2011

I'd guess you might gain more by factoring out some common factors
P(I) + X* (P(I+1) + X*(P(I+3) + X*P(I+6) + Y*P(I+9))) + Y*(P(I+2) + X*P(I+5) + Y*(P(I+4) + Y*P(I+7) + X*P(I+8)))

with several variations, possibly some better than others for numerical stability. Not that you want to lose readability while poking around for better performance or stability.

Statement functions are out of fashion, of course, but I don't see that in itself costing efficiency.

jimdempseyatthecove · ‎03-23-2011

In the equation you presented, you have three input parameters: X, Y, and I
You do not show the enclosing relationships between the input variables.
In particular, does I vary while X and/or Y remains relatively constant?
or
Does X and/or Y vary while I remains relatively consistent?
or
Does X, Y and I all consistently vary?

This relationship (or lack thereof) will affect the choice of optimization.
Additionally, the select case may be eliminated by lifting the selection out of the function and making case number of different functions/subroutines.

A similar issue arises with MAX. Does MAX have to be called 58 million times or 58 times?
If you can let the compiler do more of the work for you at compile time thanat run time, you will often experience faster code.

We would be able to offer better advice if you would expand the scope one or two more call levels.

Jim Dempsey

John_Campbell · ‎03-23-2011

As Jim indicates, it all depends on how often the function is being called and what arguments are being varied.

By expressing EQUATION310 in the following form, there may be savings in how often the calculation of elements ofvectorf are updated:

!
f(0) = 1 ! A
!
f(1) = X ! B X
f(3) = X * X ! D X * X
f(6) = f(3) * X ! G X * X * X
!
f(2) = Y ! C Y
f(4) = Y * Y ! E Y * Y
f(7) = f(4) * Y ! H Y * Y * Y
!
f(5) = X * Y ! F X * Y
f(8) = f(5) * Y ! I X * Y * Y
f(9) = f(5) * X ! J X * X * Y
!
EQUATION310 = dot_Product (p(I:I+9), f)

It's a different approach to the calculation that could better suit the use of EQUATION310.
Also, if the calculation of EQUATION310 was placed inside the X,Y,I do loops,the compilermay be better able to identify repeated calculations.

Inless this function is called millions of times, I would not expect to get a lot of benefit from changing the code in this way, and would prefer to express the calculation in a form that is closer to the way EQUATION310 is documented. Is there a better way to document the possible values of P(I:I+9) ?

John

mecej4 · ‎03-23-2011

To reinforce Steve's reply, here is a famous quote:

"...about 97% of the time: premature optimization is the root of all evil." -- Don Knuth

and another, from another person:

The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet."