What Visual Studio 2008 compiler and linker options are required for VTunes?

andy57 · ‎07-18-2009

What compiler and linker options are required for a C++ program created with Microsoft Visual Studio 2008 to run VTunes to measure instructions required and L1 Cache Reads?

TimP · ‎07-18-2009

Quoting - andy57

What compiler and linker options are required for a C++ program created with Microsoft Visual Studio 2008 to run VTunes to measure instructions required and L1 Cache Reads?

You would normally set /Zi (debug) so as to permit VTune to capture events by source line, and set your preferred optimization levels explicitly, such as /fp:fast /Ox so that they aren't cut back by debug options. You could collect events with normal release options set, but you wouldn't be able to associate them below function level. You may wish to set /GL- so that WPO doesn't confuse the location of events, but that would depend on your situation.

andy57 · ‎07-18-2009

Quoting - tim18

You would normally set /Zi (debug) so as to permit VTune to capture events by source line, and set your preferred optimization levels explicitly, such as /fp:fast /Ox so that they aren't cut back by debug options. You could collect events with normal release options set, but you wouldn't be able to associate them below function level. You may wish to set /GL- so that WPO doesn't confuse the location of events, but that would depend on your situation.

What is the /GL option?

TimP · ‎07-18-2009

Quoting - andy57

What is the /GL option?

/GL "Whole Program" optimization has been in Microsoft C++ at least since VS2003. It's on by default in VS2008 (VC9) release builds. Any such interprocedural or in-lining option can make it difficult to figure out which source code is associated with VTune events, and may require falling back on overall timing rather than profiling to determine its value.

Thomas_W_Intel · ‎08-03-2009

Quoting - tim18

/GL "Whole Program" optimization has been in Microsoft C++ at least since VS2003. It's on by default in VS2008 (VC9) release builds. Any such interprocedural or in-lining option can make it difficult to figure out which source code is associated with VTune events, and may require falling back on overall timing rather than profiling to determine its value.

Tim,

I completely agree that inlining makes it sometime really hard to determine what code line is causing a micro-architectural issue. However, the overhead of an additional function call in a inner loop might easily hide the latency of an L1 cache miss. It might therefore be better to first determine with the fully optimized binary if L1 cache misses have an impact, and if they have, disable inlining to determine where they happen exactly (if you can't get from the assembly code :)).

Kind regards
Thomas