Does VTune do assembly code simulation, ie Quexal?
I don't know if you are familiar with a product called Quexal, but it allows you to type in MMX/SSE assembly and it will analyze it for you and compute for each instruction the cycle time and make graphs to maximize parrellism, and speed.
I like how VTune looks as a profiler, but I was wondering if would also be able to do something like that? Or does intel have another product like this? Thanks, Brian
Looks like old VTune versions like 4.5 5.0 had the feature you are talking about but it was dropped. The reason is that this analysis is not very precise, once it does not take into account some factors like: -the data avalability(was the data in cache? in L1, L2 or not ) -branch mispredicitons -pipe flashing etc.
The real time events collection done by VTune can give you more realistic picture of the flow. It will take into consideration all possible factors influencing your flow (e.g. other processes/threads using the same cache lines etc)and you can analyze and decide how to improve your application n in "real word", ot in "theoretical icubator" and simpe arithmetics.
Profile guided data collection is needed for anything which depends strongly on compile time knowledge about preferred branches.
I'm not sure what you mean about data availability. P4/Xeon compilation assumes that hardware prefetch will be effective. The vectorizer is beginning to do some loop splitting analysis which may help with data buffering.
The IA64 compilers, at -O3, schedule prefetch and do versioning to take care of bank conflicts. They don't look at preceding loops to see when prefetch isn't needed.