i have a question on how to optimize some code i am working on. The heavy part of the code is a function to compute some quantities, no loops, just lots of float point computations (over 2000 lines). This function will be called millions of times for different inputs, so it is very critical to the overall performance. Also the computation has pretty a lot of if-else checkings.
I tried to run my program in vtune, and clock ticks per instruction i got was 4.5. According to vtune's manual it seems to be a very bad number. So i wonder if there is any tips on how to improve it?
One idea i have now is to break long expressions in a bunch of short ones, for example,
so cpu is more likely to do more instructions in clock cycle? But is this true? I will really appreciate any advice?
Message Edited by jdgallag on 03-28-2006 04:47 PM
thanks a lot for the reply. Yes, i did take a look at the intel math kernel lib. But what i found is those function were exclusively designed for vector operations. But for my case, all the computations are scalar-based.:-( Is my understanding correct?