Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
5260 Discussions

I've got my function analyzed, but how do I know what to change?

mownby
Beginner
835 Views
I've written a function in assembly language using SSE2 instructions and to my dismay, it performed slightly slower than the original C function. So I downloaded and installed the trial version of vtune performance analyzer to see if it could help and I was able to make some minor improvements, so now my function is about 1.2 times as the original C function. I am hoping to get it up to 1.5-2 times as fast but I am not sure what to change at this point.

According to the analysis, certain instructions are taking tons of cycles to execute so I assume this means I have pipeline stalls somewhere. But how do I tell _why_ these instructions (which normally should execute pretty fast) are taking so long? I've tried re-ordering them and it doesn't seem to be making much of a difference.
0 Kudos
1 Reply
TimP
Honored Contributor III
835 Views
Quoting - mownby

According to the analysis, certain instructions are taking tons of cycles to execute so I assume this means I have pipeline stalls somewhere. But how do I tell _why_ these instructions (which normally should execute pretty fast) are taking so long? I've tried re-ordering them and it doesn't seem to be making much of a difference.
The usual answer is to investigate events (various categories of cache misses, resource allocation stalls, ....) to see if this answers your "why." As you may have figured out, non-precise event cycle counts tend to pile up on instructions which are waiting for some previous stall to be resolved.
Many types of instruction re-ordering are done at least as effectively by out-of-order hardware execution, so it's not surprising that re-ordering makes little difference, unless you are are running on Atom, where it might well help.
0 Kudos
Reply