- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Community!
I'm a beginner at using Vtune. I have a question about how to measure the performance for each functions in co-running functions groups? Here's the example:
1. We have function A and B. If we use Vtune to monitor the LLC/Mem Bandwidth performance under stand alone setting, we can easily do this by using vtune command lines.
2. Now, when function A and B running together, and say that function A is running on CPU 0 of Numa Node 0; function B running on CPU 1-23 on Numa node 0. The results of profiling is the overall performance of two functions, but not result of A and result of B separately.
How can I check the performance deduction on function A? The way of using PID not works here since function A is short-live, like it will stop in less than 1 sec.
Thanks any possible help in advance!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ohh I guess mabe ittpause() and ittresume() could work. I'll give it a try and then get back
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think there are some methods to try:
1. Yes, ITT API can control the collection as you mentioned.
2. You can specify the CPU(s) for the collection
https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2024-0/cpu-mask.html
3. You can set a small value for the sampling-interval.
https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2024-0/sampling-interval.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Got it! BTW, should I place the vtune profiler on another numa node's CPU, like CPU 24-47? Since functions A and B used all CPUs in Numa 0. If I still play the vtune on Numa 0, there could be a possible interference in performance right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Generally, the VTune is running with workload together. Maybe you can try to specify VTune to run on cpu0 using the taskset tool, and function A on cpu1, function B on cpu2, etc., However, VTune sampling still has an impact on other functions.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page