- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What compiler and linker options are required for a C++ program created with Microsoft Visual Studio 2008 to run VTunes to measure instructions required and L1 Cache Reads?
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - andy57
What compiler and linker options are required for a C++ program created with Microsoft Visual Studio 2008 to run VTunes to measure instructions required and L1 Cache Reads?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
You would normally set /Zi (debug) so as to permit VTune to capture events by source line, and set your preferred optimization levels explicitly, such as /fp:fast /Ox so that they aren't cut back by debug options. You could collect events with normal release options set, but you wouldn't be able to associate them below function level. You may wish to set /GL- so that WPO doesn't confuse the location of events, but that would depend on your situation.
What is the /GL option?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - andy57
What is the /GL option?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
/GL "Whole Program" optimization has been in Microsoft C++ at least since VS2003. It's on by default in VS2008 (VC9) release builds. Any such interprocedural or in-lining option can make it difficult to figure out which source code is associated with VTune events, and may require falling back on overall timing rather than profiling to determine its value.
Tim,
I completely agree that inlining makes it sometime really hard to determine what code line is causing a micro-architectural issue. However, the overhead of an additional function call in a inner loop might easily hide the latency of an L1 cache miss. It might therefore be better to first determine with the fully optimized binary if L1 cache misses have an impact, and if they have, disable inlining to determine where they happen exactly (if you can't get from the assembly code :)).
Kind regards
Thomas

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page