- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi. I'm wondering if there are any tricks to speeding up finalization. My jobs usually run for 4-6 hours, of which maybe one or two hours has profiling enabled. However, finalization can take 2-4 days. I've tried limiting the sampling rate and the total data stored, but even then it still usually takes 10x longer to finalize than to profile (I have this sense that this didn't use to be the case when I was last using vtune a few years ago; perhaps I was using an older version?). I'm currently using VTune 2019. If it will make a big difference I could try to get it upgraded, but the tools are managed centrally so that's not always easy. I'm hoping there are some things I can do to bring the finalization time down without losing too much in the way of profiling coverage.
Thanks,
Ben
Enlace copiado
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi,
Thanks for reaching out to us.
Could you please share the following additional information :
1. Which analysis you are trying?
2. Which application you are using?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
- I’m just doing regular hotspot analysis
- As for which application, I’m not sure if you are asking which vtune application (I’m not aware of more than one), but you are asking that, it’s “ample-cl“.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi,
We are forwarding this case to concerned team.
Thanks
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Can you provide details of the system on which you are running, how you are running your collection, and what application you are analyzing?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I'm not sure exactly what kind of information you are looking for.
- The system is a redhat 6 linux system with a xeon processor using VTune 2019
- The command is usually run like this "amplxe-cl --start-paused --collect hotspots <command>"
- The program internally turns on and off profiling around the specific code that needs to be profiled
- The application being profiled is a compute-intensive CAD tool
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I apologize for this being dropped. Here are a few suggestions for improving your finalize performance:
- Try limiting your runtime to a shorter run.
- Set a CPU mask to only collect data from specific cores. See https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/command-line-interface/command-line-interface-reference/cpu-mask.html for details.
- Use the ittnotify API in your application to restrict collection to only certain regions of your application. See https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis/basic-usage-and-configuration/instrumenting-your-application.html for details.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi,
Has the solution provided by James helped? Do you have any updates?
Regards,
Janani Chandran
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I observed that, the finalization time is highest when the excessive callstacks information is recorded. If you do not need collecting the sampled threads stacks you may omit it and gain very fast (in my case: circa 25-35 seconds) finalization time. Of course the stacks I must reconstruct "manually" by using IDA disassembler. In order to decrease the amount of collected samples you may enable only user mode hardware events collection.
P.s.
>>>However, finalization can take 2-4 days.>>>
How large is the results(*.perf or *.tb7) file?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi,
We assume that your issue is resolved. If you need any additional information, please submit a new question as this thread will no longer be monitored.
Regards,
Janani Chandran

- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Página de impresión sencilla