- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
What compiler and linker options are required for a C++ program created with Microsoft Visual Studio 2008 to run VTunes to measure instructions required and L1 Cache Reads?
Enlace copiado
4 Respuestas
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Quoting - andy57
What compiler and linker options are required for a C++ program created with Microsoft Visual Studio 2008 to run VTunes to measure instructions required and L1 Cache Reads?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Quoting - tim18
You would normally set /Zi (debug) so as to permit VTune to capture events by source line, and set your preferred optimization levels explicitly, such as /fp:fast /Ox so that they aren't cut back by debug options. You could collect events with normal release options set, but you wouldn't be able to associate them below function level. You may wish to set /GL- so that WPO doesn't confuse the location of events, but that would depend on your situation.
What is the /GL option?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Quoting - andy57
What is the /GL option?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Quoting - tim18
/GL "Whole Program" optimization has been in Microsoft C++ at least since VS2003. It's on by default in VS2008 (VC9) release builds. Any such interprocedural or in-lining option can make it difficult to figure out which source code is associated with VTune events, and may require falling back on overall timing rather than profiling to determine its value.
Tim,
I completely agree that inlining makes it sometime really hard to determine what code line is causing a micro-architectural issue. However, the overhead of an additional function call in a inner loop might easily hide the latency of an L1 cache miss. It might therefore be better to first determine with the fully optimized binary if L1 cache misses have an impact, and if they have, disable inlining to determine where they happen exactly (if you can't get from the assembly code :)).
Kind regards
Thomas

Responder
Opciones de temas
- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Página de impresión sencilla