I have been hired by a company fairly recently and need to prepare profiling their software which runs in a cluster of Intel® Xeon® Silver 4210 Processors:
I have used Intel VTune/Parallel Studio before but I have never profiled the CPU utilization in a live cluster.
I have full access to the software, c++, and would primarily like to identify lock contentions.
The claim is that the software currently have issues with parts of their asynchronous models and my job is to try to identify the worse offender(s).
Ideally this should be done without requiring to install a "new" product such as Parallel Studio etc since this will simply be rejected since it is a cluster used to serve our customers. However if the task then will be much harder I am happy if anyone can share if there are other options.
Do you have any suggestions of what I should read up on to:
1) Primarily find lock contentions.
2) Inject code from one of your API's that might improve the success rate of finding them.
3) Limit the load of the process to be analyzed to be within reasonable limits. "Reasonable limits" will be considered as a factor of 2 with an absolute max of 4. We will perform the profiling on live data feeds from existing customer usage.
Thank you for posting in Intel Communities. Answering your questions:
1. Primarily find lock contentions
A: To find lock contentions please read through this cookbook which explains in detail on how to get lock contentions and would be displayed like this:
To know more about lock contention please refer: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/openmp-potential-gain/lock-contention.html
2. Inject code from one of your API's that might improve the success rate of finding them.
A: Could you please elaborate more about the question? Meanwhile could you please explore the ITT APIs provided by Vtune profiler. The Instrumentation and Tracing Technology API (ITT API) provided by the Intel® VTune™ Profiler enables your application to generate and control the collection of trace data during its execution. To refer more about ITT APIs, please refer this.
3. Limit the load of the process to be analyzed to be within reasonable limits. "Reasonable limits" will be considered as a factor of 2 with an absolute max of 4. We will perform the profiling on live data feeds from existing customer usage.
A: For Limiting Data Collection, to Specify a predefined amount of data to collect by setting up the expected result size or collection time, please refer:https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance/control-data-collection/data-collection-limit.html
For Limiting Data Collection by process, please refer Attach to process will be suited for your requirement of profiling on live data feeds. Attach to Process option Attaches a collection to a running process specified by the process ID. To know more about it please refer: https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/command-line-interface/command-line-interface-reference/target-pid.html .
Hope this helps
We have not heard from you in a while. Has the solution provided helped? If yes, can we discontinue monitoring this thread? If not, please let us know, we would be glad to help.
We assume that your query is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.