Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4995 Discussions

I've got my function analyzed, but how do I know what to change?

mownby
Beginner
208 Views
I've written a function in assembly language using SSE2 instructions and to my dismay, it performed slightly slower than the original C function. So I downloaded and installed the trial version of vtune performance analyzer to see if it could help and I was able to make some minor improvements, so now my function is about 1.2 times as the original C function. I am hoping to get it up to 1.5-2 times as fast but I am not sure what to change at this point.

According to the analysis, certain instructions are taking tons of cycles to execute so I assume this means I have pipeline stalls somewhere. But how do I tell _why_ these instructions (which normally should execute pretty fast) are taking so long? I've tried re-ordering them and it doesn't seem to be making much of a difference.
0 Kudos
1 Reply
TimP
Honored Contributor III
208 Views
Quoting - mownby

According to the analysis, certain instructions are taking tons of cycles to execute so I assume this means I have pipeline stalls somewhere. But how do I tell _why_ these instructions (which normally should execute pretty fast) are taking so long? I've tried re-ordering them and it doesn't seem to be making much of a difference.
The usual answer is to investigate events (various categories of cache misses, resource allocation stalls, ....) to see if this answers your "why." As you may have figured out, non-precise event cycle counts tend to pile up on instructions which are waiting for some previous stall to be resolved.
Many types of instruction re-ordering are done at least as effectively by out-of-order hardware execution, so it's not surprising that re-ordering makes little difference, unless you are are running on Atom, where it might well help.
0 Kudos
Reply