Community
cancel
Showing results for 
Search instead for 
Did you mean: 
mownby
Beginner
35 Views

I've got my function analyzed, but how do I know what to change?

I've written a function in assembly language using SSE2 instructions and to my dismay, it performed slightly slower than the original C function. So I downloaded and installed the trial version of vtune performance analyzer to see if it could help and I was able to make some minor improvements, so now my function is about 1.2 times as the original C function. I am hoping to get it up to 1.5-2 times as fast but I am not sure what to change at this point.

According to the analysis, certain instructions are taking tons of cycles to execute so I assume this means I have pipeline stalls somewhere. But how do I tell _why_ these instructions (which normally should execute pretty fast) are taking so long? I've tried re-ordering them and it doesn't seem to be making much of a difference.
0 Kudos
1 Reply
TimP
Black Belt
35 Views

Quoting - mownby

According to the analysis, certain instructions are taking tons of cycles to execute so I assume this means I have pipeline stalls somewhere. But how do I tell _why_ these instructions (which normally should execute pretty fast) are taking so long? I've tried re-ordering them and it doesn't seem to be making much of a difference.
The usual answer is to investigate events (various categories of cache misses, resource allocation stalls, ....) to see if this answers your "why." As you may have figured out, non-precise event cycle counts tend to pile up on instructions which are waiting for some previous stall to be resolved.
Many types of instruction re-ordering are done at least as effectively by out-of-order hardware execution, so it's not surprising that re-ordering makes little difference, unless you are are running on Atom, where it might well help.
Reply