- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know if you are familiar with a product called Quexal, but it allows you to type in MMX/SSE assembly and it will analyze it for you and compute for each instruction the cycle time and make graphs to maximize parrellism, and speed.
I like how VTune looks as a profiler, but I was wondering if would also be able to do something like that? Or does intel have another product like this?
Thanks,
Brian
I like how VTune looks as a profiler, but I was wondering if would also be able to do something like that? Or does intel have another product like this?
Thanks,
Brian
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry we dont have anything like that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Brian,
Looks like old VTune versions like 4.5 5.0 had the feature you are talking about but it was dropped. The reason is that this analysis is not very precise, once it does not take into account some factors like:
-the data avalability(was the data in cache? in L1, L2 or not )
-branch mispredicitons
-pipe flashing
etc.
The real time events collection done by VTune can give you more realistic picture of the flow. It will take into consideration all possible factors influencing your flow (e.g. other processes/threads using the same cache lines etc)and you can analyze and decide how to improve your application n in "real word", ot in "theoretical icubator" and simpe arithmetics.
-Daniel
Looks like old VTune versions like 4.5 5.0 had the feature you are talking about but it was dropped. The reason is that this analysis is not very precise, once it does not take into account some factors like:
-the data avalability(was the data in cache? in L1, L2 or not )
-branch mispredicitons
-pipe flashing
etc.
The real time events collection done by VTune can give you more realistic picture of the flow. It will take into consideration all possible factors influencing your flow (e.g. other processes/threads using the same cache lines etc)and you can analyze and decide how to improve your application n in "real word", ot in "theoretical icubator" and simpe arithmetics.
-Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How much does the intel optimizing compiler try to calculate the things like data availiblity, branch predictions, etc?
Brian
Brian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Profile guided data collection is needed for anything which depends strongly on compile time knowledge about preferred branches.
I'm not sure what you mean about data availability. P4/Xeon compilation assumes that hardware prefetch will be effective. The vectorizer is beginning to do some loop splitting analysis which may help with data buffering.
The IA64 compilers, at -O3, schedule prefetch and do versioning to take care of bank conflicts. They don't look at preceding loops to see when prefetch isn't needed.
I'm not sure what you mean about data availability. P4/Xeon compilation assumes that hardware prefetch will be effective. The vectorizer is beginning to do some loop splitting analysis which may help with data buffering.
The IA64 compilers, at -O3, schedule prefetch and do versioning to take care of bank conflicts. They don't look at preceding loops to see when prefetch isn't needed.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page