how to optimize complicated algebra computations

benjamin_gu · ‎03-28-2006

Hi,

i have a question on how to optimize some code i am working on. The heavy part of the code is a function to compute some quantities, no loops, just lots of float point computations (over 2000 lines). This function will be called millions of times for different inputs, so it is very critical to the overall performance. Also the computation has pretty a lot of if-else checkings.

I tried to run my program in vtune, and clock ticks per instruction i got was 4.5. According to vtune's manual it seems to be a very bad number. So i wonder if there is any tips on how to improve it?

One idea i have now is to break long expressions in a bunch of short ones, for example,

a=b*c*e*f;

becomes,

a1=b*c;
a2=e*f;
a=a1*a2;

so cpu is more likely to do more instructions in clock cycle? But is this true? I will really appreciate any advice?

Best,
Ben

jeffrey-gallagher · ‎03-29-2006

BEN,

Be sure to look at the intel Math Kernel Libraries, which are highly optimized, thread-safe math routines for High-Performance Computing (HPC) science, engineering, and financial applications that require maximum performance on Intel processors.

If you go to our URL you can purchase them or kick the tires on an eval copy: sounds like you might find a definite use for them based on your posting!

http://www.intel.com/software/products/mkl

cheers

jdg

Message Edited by jdgallag on 03-28-2006 04:47 PM

TimP · ‎03-29-2006

According to what you have said, you should be using VTune to look for mispredicted branches, as well as finding the sections of code responsible for high ticks per instruction. Of course, reading the docs should have told you that, and given you hints about how to narrow down your search for problems. By not allowing looping, you severely limit your options for improving performance.

benjamin_gu · ‎03-29-2006

Hi Jdg,

thanks a lot for the reply. Yes, i did take a look at the intel math kernel lib. But what i found is those function were exclusively designed for vector operations. But for my case, all the computations are scalar-based.:-( Is my understanding correct?

Best,

Ben