Intel(R) VTune(TM) Profiler Self Check Utility Copyright (C) 2009 Intel Corporation. All rights reserved. Build Number: 624050 Ignored warnings: ['To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.', 'To enable hardware event-based sampling, PRODUCT_LEGAL_SHORT_NAME has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.'] Check of files: Ok ================================================================================ Context values: Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/amplxe-runss --context-value-list Stdout: targetOS: Linux OS: Linux OSBuildNumber: 0 OSBitness: 64 RootPrivileges: false isPtraceScopeLimited: false isCATSupportedByCPU: true isL3CATAvailable: true L3CATDetails: COS=16;ways=11 isL2CATAvailable: false isL3MonitoringSupportedByCPU: true LLCSize: 28835840 cacheMonitoringUpscalingFactor: 81920 isL3CacheOccupancyAvailable: true isL3TotalBWAvailable: true isL3LocalBWAvailable: true isTSXAvailable: true isPTAvailable: true isHTEnabled: false fpgaOnBoard: None omniPathOnBoard: None genArchOnBoard: 0 pciClassParts: tidValuesForIO: 0x1d8;0x1f0;0x1f8 populatedIoParts: populatedIoUnits: populatedTidValuesForIO: isSGXAvailable: false LinuxRelease: 3.10.0-1062.52.2.el7.x86_64 is3DXPPresent: false is3DXP2LMMode: false is3DXPAppDirectMode: false IsNUMANodeWithoutCPUsPresent: false Hypervisor: None PerfmonVersion: 4 isMaxDRAMBandwidthMeasurementSupported: true preferedGpuAdapter: none isEHFIAvailable: false isPtraceAvailable: true i915Status: MissingDriver isFtraceAvailable: ftraceAccessError,debugfsNotAccessible isMdfEtwAvailable: false isCSwitchAvailable: no isGpuBusynessAvailable: unsupportedHardware isGpuWaitAvailable: no isFunctionTracingAvailable: no isIowaitTracingAvailable: no isVSyncAvailable: no HypervisorType: None isDeviceOrCredentialGuardEnabled: false isSEPDriverAvailable: false isPAXDriverLoaded: false platformType: 127 CPU_NAME: Intel(R) Xeon(R) Processor code named Cascadelake PMU: cascadelake_server referenceFrequency: 2100000000 isPStateAvailable: true isVTSSPPDriverAvailable: false isNMIWatchDogTimerRunning: true LinuxPerfCredentials: Cpu LinuxPerfCapabilities: breakpoint:raw;cpu:raw,format,events,ldlat,frontend;intel_pt:raw,format;kprobe:raw,format;msr:raw,format,events;power:raw,format,events;software:raw;tracepoint:raw;uncore_cha:20,raw,format;uncore_iio:6,raw,format;uncore_iio_free_running:6,raw,format,events;uncore_imc:6,raw,format,events;uncore_irp:6,raw,format;uncore_m2m:2,raw,format;uncore_m2pcie:3,raw,format;uncore_m3upi:3,raw,format;uncore_pcu:raw,format;uncore_ubox:raw,format;uncore_upi:3,raw,format;uprobe:raw,format LinuxPerfStackCapabilities: fp,dwarf areKernelPtrsRestricted: no LinuxPerfMuxIntervalMs: 1 isPerfPCIeMappingAvailable: false isAOCLAvailable: false isTPSSAvailable: true isPytraceAvailable: true forceShowInlines: false isEnergyCollectionSupported: true isSocwatchDriverLoaded: false isCPUSupportedBySocwatch: true isCpuThrottlingAvailable: false isIPMWatchReady: false osCountersCollectorAvailability: available l0LoaderStatus: LibNotFound l0DevicesAvailable: false l0VPUDevicesAvailable: false l0GPUDevicesAvailable: false Getting context values: OK ================================================================================ Check driver: isSEPDriverAvailable: false isPAXDriverLoaded: false Command line: lsmod Is SEP in lsmod: False The SEP driver is not available. ================================================================================ SEP version: Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/sep -version Stdout: Sampling Enabling Product Version: 5.34 built on Mar 28 2022 23:39:00 SEP Driver Version: PAX Driver Version: Platform type: 127 CPU name: Intel(R) Xeon(R) Processor code named Cascadelake PMU: cascadelake_server Stderr: Error retrieving SEP driver version Error retrieving PAX driver version Check driver with sep -version: Fail ================================================================================ HW event-based analysis (counting mode)... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -collect performance-snapshot -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ps -data-limit 0 -finalization-mode none -source-search-dir /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/src -- /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/matrix Stdout: Addr of buf1 = 0x7efea0d95010 Offs of buf1 = 0x7efea0d95180 Addr of buf2 = 0x7efe9ed94010 Offs of buf2 = 0x7efe9ed941c0 Addr of buf3 = 0x7efe9cd93010 Offs of buf3 = 0x7efe9cd93100 Addr of buf4 = 0x7efe9ad92010 Offs of buf4 = 0x7efe9ad92140 Threads #: 16 Pthreads Matrix size: 2048 Using multiply kernel: multiply1 Execution time = 5.048 seconds Stderr: vtune: Peak bandwidth measurement started. vtune: Peak bandwidth measurement finished. vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ps -command stop. vtune: Collection stopped. vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ps' vtune: Executing actions 0 % vtune: Executing actions 100 % vtune: Executing actions 100 % done HW event-based analysis (counting mode) (Perf) Example of analysis types: Performance Snapshot Collection: Ok -------------------------------------------------------------------------------- Running finalization... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -finalize -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ps Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ps' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 0 % Finalizing the result vtune: Executing actions 0 % Clearing the database vtune: Executing actions 14 % Clearing the database vtune: Executing actions 14 % Loading raw data to the database vtune: Executing actions 14 % Loading 'systemcollector-27289-i63.sc' file vtune: Executing actions 25 % Loading 'systemcollector-27289-i63.sc' file vtune: Executing actions 25 % Loading '27300.stat.perf' file vtune: Executing actions 25 % Updating precomputed scalar metrics vtune: Executing actions 28 % Updating precomputed scalar metrics vtune: Executing actions 28 % Processing profile metrics and debug information vtune: Executing actions 39 % Processing profile metrics and debug information vtune: Executing actions 39 % Setting data model parameters vtune: Executing actions 39 % Resolving module symbols vtune: Executing actions 39 % Resolving thread name information vtune: Executing actions 43 % Resolving thread name information vtune: Executing actions 43 % Resolving call target names for dynamic code vtune: Executing actions 48 % Resolving call target names for dynamic code vtune: Executing actions 48 % Resolving interrupt name information vtune: Executing actions 53 % Resolving interrupt name information vtune: Executing actions 53 % Processing profile metrics and debug information vtune: Executing actions 56 % Processing profile metrics and debug information vtune: Executing actions 57 % Processing profile metrics and debug information vtune: Executing actions 58 % Processing profile metrics and debug information vtune: Executing actions 59 % Processing profile metrics and debug information vtune: Executing actions 60 % Processing profile metrics and debug information vtune: Executing actions 62 % Processing profile metrics and debug information vtune: Executing actions 63 % Processing profile metrics and debug information vtune: Executing actions 63 % Preparing output tree vtune: Executing actions 63 % Parsing columns in input tree vtune: Executing actions 64 % Parsing columns in input tree vtune: Executing actions 64 % Creating top-level columns vtune: Executing actions 65 % Creating top-level columns vtune: Executing actions 65 % Creating top-level rows vtune: Executing actions 67 % Creating top-level rows vtune: Executing actions 67 % Preparing output tree vtune: Executing actions 67 % Parsing columns in input tree vtune: Executing actions 67 % Creating top-level columns vtune: Executing actions 69 % Creating top-level columns vtune: Executing actions 69 % Creating top-level rows vtune: Executing actions 70 % Creating top-level rows vtune: Executing actions 70 % Preparing output tree vtune: Executing actions 70 % Parsing columns in input tree vtune: Executing actions 71 % Parsing columns in input tree vtune: Executing actions 71 % Creating top-level columns vtune: Executing actions 72 % Creating top-level columns vtune: Executing actions 72 % Creating top-level rows vtune: Executing actions 74 % Creating top-level rows vtune: Executing actions 74 % Preparing output tree vtune: Executing actions 74 % Parsing columns in input tree vtune: Executing actions 74 % Creating top-level columns vtune: Executing actions 76 % Creating top-level columns vtune: Executing actions 76 % Creating top-level rows vtune: Executing actions 77 % Creating top-level rows vtune: Executing actions 78 % Creating top-level rows vtune: Executing actions 78 % Preparing output tree vtune: Executing actions 78 % Parsing columns in input tree vtune: Executing actions 78 % Creating top-level columns vtune: Executing actions 79 % Creating top-level columns vtune: Executing actions 79 % Creating top-level rows vtune: Executing actions 81 % Creating top-level rows vtune: Executing actions 81 % Preparing output tree vtune: Executing actions 81 % Parsing columns in input tree vtune: Executing actions 81 % Creating top-level columns vtune: Executing actions 83 % Creating top-level columns vtune: Executing actions 83 % Creating top-level rows vtune: Executing actions 84 % Creating top-level rows vtune: Executing actions 85 % Creating top-level rows vtune: Executing actions 85 % Setting data model parameters vtune: Executing actions 85 % Precomputing frequently used data vtune: Executing actions 85 % Precomputing frequently used data vtune: Executing actions 86 % Precomputing frequently used data vtune: Executing actions 87 % Precomputing frequently used data vtune: Executing actions 89 % Precomputing frequently used data vtune: Executing actions 90 % Precomputing frequently used data vtune: Executing actions 91 % Precomputing frequently used data vtune: Executing actions 92 % Precomputing frequently used data vtune: Executing actions 93 % Precomputing frequently used data vtune: Executing actions 94 % Precomputing frequently used data vtune: Executing actions 95 % Precomputing frequently used data vtune: Executing actions 96 % Precomputing frequently used data vtune: Executing actions 97 % Precomputing frequently used data vtune: Executing actions 97 % Updating precomputed scalar metrics vtune: Executing actions 99 % Updating precomputed scalar metrics vtune: Executing actions 99 % Discarding redundant overtime data vtune: Executing actions 99 % Saving the result vtune: Executing actions 100 % Saving the result vtune: Executing actions 100 % done Finalization: Ok -------------------------------------------------------------------------------- Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -R summary -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ps Stdout: Elapsed Time: 5.071s IPC: 0.332 | The IPC may be too low. This could be caused by issues such as memory | stalls, instruction starvation, branch misprediction or long latency | instructions. Explore the other hardware-related metrics to identify what | is causing low IPC. | DP GFLOPS: 3.400 x87 GFLOPS: 0.000 Average CPU Frequency: 2.793 GHz Logical Core Utilization: 36.7% (14.678 out of 40) | The metric value is low, which may signal a poor logical CPU cores | utilization. Consider improving physical core utilization as the first step | and then look at opportunities to utilize logical cores, which in some cases | can improve processor throughput and overall performance of multi-threaded | applications. | Physical Core Utilization: 36.7% (14.679 out of 40) | The metric value is low, which may signal a poor physical CPU cores | utilization caused by: | - load imbalance | - threading runtime overhead | - contended synchronization | - thread/process underutilization | - incorrect affinity that utilizes logical cores instead of physical | cores | Run the HPC Performance Characterization analysis to estimate the | efficiency of MPI and OpenMP parallelism or run the Locks and Waits | analysis to identify parallel bottlenecks for other parallel runtimes. | Microarchitecture Usage: 8.3% of Pipeline Slots | You code efficiency on this platform is too low. | | Possible cause: memory stalls, instruction starvation, branch misprediction | or long latency instructions. | | Next steps: Run Microarchitecture Exploration analysis to identify the cause | of the low microarchitecture usage efficiency. | Retiring: 8.3% of Pipeline Slots Front-End Bound: 0.2% of Pipeline Slots Bad Speculation: 0.1% of Pipeline Slots Back-End Bound: 91.4% of Pipeline Slots | A significant portion of pipeline slots are remaining empty. When | operations take too long in the back-end, they introduce bubbles in the | pipeline that ultimately cause fewer pipeline slots containing useful | work to be retired per cycle than the machine is capable to support. This | opportunity cost results in slower execution. Long-latency operations | like divides and memory operations can cause this, as can too many | operations being directed to a single execution port (for example, more | multiply operations arriving in the back-end per cycle than the execution | unit can support). | Memory Bound: 80.7% of Pipeline Slots | The metric value is high. This can indicate that the significant | fraction of execution pipeline slots could be stalled due to demand | memory load and stores. Use Memory Access analysis to have the metric | breakdown by memory hierarchy, memory bandwidth information, | correlation by memory objects. | L1 Bound: 0.0% of Clockticks FB Full: 100.0% of Clockticks L2 Bound: 0.0% of Clockticks L3 Bound: 49.6% of Clockticks | This metric shows how often CPU was stalled on L3 cache, or | contended with a sibling Core. Avoiding cache misses (L2 | misses/L3 hits) improves the latency and increases performance. | L3 Latency: 100.0% of Clockticks | This metric shows a fraction of cycles with demand load | accesses that hit the L3 cache under unloaded scenarios | (possibly L3 latency limited). Avoiding private cache misses | (i.e. L2 misses/L3 hits) will improve the latency, reduce | contention with sibling physical cores and increase | performance. Note the value of this node may overlap with its | siblings. | DRAM Bound: 27.9% of Clockticks | This metric shows how often CPU was stalled on the main memory | (DRAM). Caching typically improves the latency and increases | performance. | Memory Bandwidth: 94.3% of Clockticks | Issue: A significant fraction of cycles was stalled due to | approaching bandwidth limits of the main memory (DRAM). | | Tips: Improve data accesses to reduce cacheline transfers | from/to memory using these possible techniques: | - Consume all bytes of each cacheline before it is | evicted (for example, reorder structure elements and | split non-hot ones). | - Merge compute-limited and bandwidth-limited loops. | - Use NUMA optimizations on a multi-socket system. | | Note: software prefetches do not help a bandwidth-limited | application. | Memory Latency: 5.2% of Clockticks | Issue: A significant fraction of cycles was stalled due to | the latency of the main memory (DRAM). | | Tips: Improve data accesses or interleave them with compute | using such possible techniques as data layout re-structuring | or software prefetches (through the compiler). | Local DRAM: 15.7% of Clockticks | The number of CPU stalls on loads from the local memory | exceeds the threshold. Consider caching data to improve | the latency and increase the performance. | Remote DRAM: 10.2% of Clockticks | The number of CPU stalls on loads from the remote memory | exceeds the threshold. This is often caused by non- | optimal NUMA memory allocations. | Remote Cache: 20.5% of Clockticks | The number of CPU stalls on loads from the remote cache | exceeds the threshold. This is often caused by non- | optimal NUMA memory allocations. | Store Bound: 0.1% of Clockticks Core Bound: 10.7% of Pipeline Slots | This metric represents how much Core non-memory issues were of a | bottleneck. Shortage in hardware compute resources, or dependencies | software's instructions are both categorized under Core Bound. Hence | it may indicate the machine ran out of an OOO resources, certain | execution units are overloaded or dependencies in program's data- or | instruction- flow are limiting the performance (e.g. FP-chained long- | latency arithmetic operations). | Memory Bound: 80.7% of Pipeline Slots | The metric value is high. This can indicate that the significant fraction of | execution pipeline slots could be stalled due to demand memory load and | stores. Use Memory Access analysis to have the metric breakdown by memory | hierarchy, memory bandwidth information, correlation by memory objects. | Cache Bound: 49.6% of Clockticks | A significant proportion of cycles are being spent on data fetches from | caches. Check Memory Access analysis to see if accesses to L2 or L3 | caches are problematic and consider applying the same performance tuning | as you would for a cache-missing workload. This may include reducing the | data working set size, improving data access locality, blocking or | partitioning the working set to fit in the lower cache levels, or | exploiting hardware prefetchers. Consider using software prefetchers, but | note that they can interfere with normal loads, increase latency, and | increase pressure on the memory system. This metric includes coherence | penalties for shared data. Check Microarchitecture Exploration analysis | to see if contested accesses or data sharing are indicated as likely | issues. | DRAM Bound: 27.9% of Clockticks | The metric value is high. This indicates that a significant fraction of | cycles could be stalled on the main memory (DRAM) because of demand loads | or stores. | | The code is memory bandwidth bound, which means that there are a | significant fraction of cycles during which the bandwidth limits of the | main memory are being reached and the code could stall. Review the | Bandwidth Utilization Histogram to estimate the scale of the issue. | Improve data accesses to reduce cacheline transfers from/to memory using | these possible techniques: 1) consume all bytes of each cacheline before | it is evicted (for example, reorder structure elements and split non-hot | ones); 2) merge compute-limited and bandwidth-limited loops; 3) use NUMA | optimizations on a multi-socket system. | Average DRAM Bandwidth, GB/s: 0.000 NUMA: % of Remote Accesses: 62.1% | A significant amount of DRAM loads were serviced from remote DRAM. | Wherever possible, try to consistently use data on the same core, or at | least the same package, as it was allocated on. | Vectorization: 0.1% of Packed FP Operations | A significant fraction of floating point arithmetic instructions are scalar. | This indicates that the code was not fully vectorized. Use Intel Advisor to | see possible reasons why the code was not vectorized. | Instruction Mix SP FLOPs: 0.0% of uOps Packed: 0.0% from SP FP 128-bit: 0.0% from SP FP 256-bit: 0.0% from SP FP 512-bit: 0.0% from SP FP Scalar: 0.0% from SP FP DP FLOPs: 24.8% of uOps Packed: 0.1% from DP FP 128-bit: 0.1% from DP FP 256-bit: 0.0% from DP FP 512-bit: 0.0% from DP FP Scalar: 99.9% from DP FP | A significant fraction of floating point arithmetic instructions | are scalar. This indicates that the code was not fully | vectorized. Use Intel Advisor to see possible reasons why the | code was not vectorized. | x87 FLOPs: 0.0% of uOps Non-FP: 75.2% of uOps FP Arith/Mem Rd Instr. Ratio: 0.996 FP Arith/Mem Wr Instr. Ratio: 1.990 Collection and Platform Info Application Command Line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/matrix Operating System: 3.10.0-1062.52.2.el7.x86_64 NAME="Red Hat Enterprise Linux Server" VERSION="7.7 (Maipo)" ID="rhel" ID_LIKE="fedora" VARIANT="Server" VARIANT_ID="server" VERSION_ID="7.7" PRETTY_NAME="Red Hat Enterprise Linux Server 7.7 (Maipo)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:7.7:GA:server" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.7 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.7" Computer Name: i63 Result Size: 3.8 MB Collection start time: 13:27:27 12/09/2022 UTC Collection stop time: 13:27:32 12/09/2022 UTC Collector Type: Driverless Perf per-process counting CPU Name: Intel(R) Xeon(R) Processor code named Cascadelake Frequency: 2.095 GHz Logical CPU Count: 40 Max DRAM Single-Package Bandwidth: 120.000 GB/s Cache Allocation Technology Level 2 capability: not detected Level 3 capability: available Recommendations: Hotspots: Start with Hotspots analysis to understand the efficiency of your algorithm. | Use Hotspots analysis to identify the most time consuming functions. | Drill down to see the time spent on every line of code. Memory Access: The Memory Bound metric is high (80.7%). A significant fraction of execution pipeline slots could be stalled due to demand memory load and stores. | Use Memory Access analysis to measure metrics that can identify memory | access issues. HPC Performance Characterization: Vectorization (0.1%) is low. A significant fraction of floating point arithmetic instructions are scalar. This indicates that the code was not fully vectorized. Use Intel Advisor to see possible reasons why the code was not vectorized. | Use HPC Performance Characterization analysis to examine the performance | of compute-intensive applications. Understand CPU/GPU utilization and get | information about OpenMP efficiency, memory access, and vectorization. Threading: There is poor utilization of logical CPU cores (36.7%) in your application. | Use Threading to explore more opportunities to increase parallelism in | your application. If you want to skip descriptions of detected performance issues in the report, enter: vtune -report summary -report-knob show-issues=false -r . Alternatively, you may view the report in the csv format: vtune -report -format=csv. Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ps' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 50 % Finalizing results vtune: Executing actions 50 % Generating a report vtune: Executing actions 50 % Setting data model parameters vtune: Executing actions 75 % Setting data model parameters vtune: Executing actions 75 % Generating a report vtune: Executing actions 100 % Generating a report vtune: Executing actions 100 % done Report: Ok ================================================================================ Instrumentation based analysis check... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -collect hotspots -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_tpss -data-limit 0 -finalization-mode none -source-search-dir /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/src -- /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/matrix Stdout: Addr of buf1 = 0x7f6b64965010 Offs of buf1 = 0x7f6b64965180 Addr of buf2 = 0x7f6b62964010 Offs of buf2 = 0x7f6b629641c0 Addr of buf3 = 0x7f6b60963010 Offs of buf3 = 0x7f6b60963100 Addr of buf4 = 0x7f6b5e962010 Offs of buf4 = 0x7f6b5e962140 Threads #: 16 Pthreads Matrix size: 2048 Using multiply kernel: multiply1 Execution time = 5.306 seconds Stderr: vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_tpss -command stop. vtune: Collection stopped. vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_tpss' vtune: Executing actions 0 % vtune: Executing actions 100 % vtune: Executing actions 100 % done Instrumentation based analysis check Example of analysis types: Hotspots and Threading with user-mode sampling Collection: Ok -------------------------------------------------------------------------------- Running finalization... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -finalize -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_tpss Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_tpss' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 0 % Finalizing the result vtune: Executing actions 0 % Clearing the database vtune: Executing actions 14 % Clearing the database vtune: Executing actions 14 % Loading raw data to the database vtune: Executing actions 14 % Loading 'systemcollector-27452-i63.sc' file vtune: Executing actions 25 % Loading 'systemcollector-27452-i63.sc' file vtune: Executing actions 25 % Loading '27462.stat.perf' file vtune: Executing actions 25 % Loading '27452-27462.0.trace' file vtune: Executing actions 25 % Updating precomputed scalar metrics vtune: Executing actions 28 % Updating precomputed scalar metrics vtune: Executing actions 28 % Processing profile metrics and debug information vtune: Executing actions 39 % Processing profile metrics and debug information vtune: Executing actions 39 % Setting data model parameters vtune: Executing actions 39 % Resolving module symbols vtune: Executing actions 39 % Resolving information for `libpthread.so.0' vtune: Executing actions 39 % Resolving information for `matrix' vtune: Warning: Cannot locate debugging information for file `/lib64/libpthread.so.0'. vtune: Executing actions 39 % Resolving information for `libc.so.6' vtune: Warning: Cannot locate debugging information for file `/lib64/libc.so.6'. vtune: Executing actions 40 % Resolving information for `libc.so.6' vtune: Executing actions 42 % Resolving information for `libc.so.6' vtune: Executing actions 43 % Resolving information for `libc.so.6' vtune: Executing actions 43 % Resolving information for `libtpsstool.so' vtune: Warning: Cannot locate debugging information for file `/work/scitas-ge/orliac/intel/vtune/2022.3.0/lib64/libtpsstool.so'. vtune: Executing actions 45 % Resolving information for `libtpsstool.so' vtune: Executing actions 45 % Resolving bottom user stack information vtune: Executing actions 46 % Resolving bottom user stack information vtune: Executing actions 46 % Resolving thread name information vtune: Executing actions 47 % Resolving thread name information vtune: Executing actions 48 % Resolving thread name information vtune: Executing actions 48 % Resolving call target names for dynamic code vtune: Executing actions 50 % Resolving call target names for dynamic code vtune: Executing actions 50 % Resolving interrupt name information vtune: Executing actions 53 % Resolving interrupt name information vtune: Executing actions 53 % Processing profile metrics and debug information vtune: Executing actions 56 % Processing profile metrics and debug information vtune: Executing actions 57 % Processing profile metrics and debug information vtune: Executing actions 58 % Processing profile metrics and debug information vtune: Executing actions 60 % Processing profile metrics and debug information vtune: Executing actions 60 % Setting data model parameters vtune: Executing actions 60 % Precomputing frequently used data vtune: Executing actions 60 % Precomputing frequently used data vtune: Executing actions 63 % Precomputing frequently used data vtune: Executing actions 66 % Precomputing frequently used data vtune: Executing actions 69 % Precomputing frequently used data vtune: Executing actions 72 % Precomputing frequently used data vtune: Executing actions 72 % Updating precomputed scalar metrics vtune: Executing actions 75 % Updating precomputed scalar metrics vtune: Executing actions 75 % Discarding redundant overtime data vtune: Executing actions 78 % Discarding redundant overtime data vtune: Executing actions 78 % Saving the result vtune: Executing actions 82 % Saving the result vtune: Executing actions 85 % Saving the result vtune: Executing actions 99 % Saving the result vtune: Executing actions 100 % Saving the result vtune: Executing actions 100 % done Finalization: Ok -------------------------------------------------------------------------------- Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -limit 5 -format csv -csv-delimiter comma -report hotspots -group-by function -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_tpss Stdout: Function,CPU Time,CPU Time:Effective Time,CPU Time:Spin Time,CPU Time:Overhead Time,Module,Function (Full),Source File,Start Address multiply1,74.209340,74.209340,0.0,0.0,matrix,multiply1,multiply.c,0x401550 init_arr,0.010000,0.010000,0.0,0.0,matrix,init_arr,matrix.c,0x400d4f pthread_create,0.010000,0.010000,0.0,0.0,libpthread.so.0,pthread_create,[Unknown],0x8120 Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_tpss' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 50 % Finalizing results vtune: Executing actions 50 % Generating a report vtune: Executing actions 50 % Setting data model parameters vtune: Executing actions 75 % Setting data model parameters vtune: Executing actions 75 % Generating a report vtune: Executing actions 100 % Generating a report vtune: Executing actions 100 % done Report: Ok ================================================================================ HW event-based analysis check... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -collect hotspots -knob sampling-mode=hw -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah -data-limit 0 -finalization-mode none -source-search-dir /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/src -- /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/matrix Stdout: Addr of buf1 = 0x7f29b9733010 Offs of buf1 = 0x7f29b9733180 Addr of buf2 = 0x7f29b7732010 Offs of buf2 = 0x7f29b77321c0 Addr of buf3 = 0x7f29b5731010 Offs of buf3 = 0x7f29b5731100 Addr of buf4 = 0x7f29b3730010 Offs of buf4 = 0x7f29b3730140 Threads #: 16 Pthreads Matrix size: 2048 Using multiply kernel: multiply1 Execution time = 4.972 seconds Stderr: vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah -command stop. vtune: Collection stopped. vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah' vtune: Executing actions 0 % vtune: Executing actions 100 % vtune: Executing actions 100 % done HW event-based analysis check (Perf) Example of analysis types: Hotspots with HW event-based sampling, HPC Performance Characterization, etc. Collection: Ok vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. -------------------------------------------------------------------------------- Running finalization... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -finalize -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 0 % Finalizing the result vtune: Executing actions 0 % Clearing the database vtune: Executing actions 14 % Clearing the database vtune: Executing actions 14 % Loading raw data to the database vtune: Executing actions 14 % Loading 'systemcollector-27579-i63.sc' file vtune: Executing actions 25 % Loading 'systemcollector-27579-i63.sc' file vtune: Executing actions 25 % Loading 'system-wide.perf' file vtune: Executing actions 25 % Updating precomputed scalar metrics vtune: Executing actions 28 % Updating precomputed scalar metrics vtune: Executing actions 28 % Processing profile metrics and debug information vtune: Executing actions 39 % Processing profile metrics and debug information vtune: Executing actions 39 % Setting data model parameters vtune: Executing actions 39 % Resolving module symbols vtune: Executing actions 39 % Resolving information for `matrix' vtune: Executing actions 41 % Resolving information for `matrix' vtune: Executing actions 41 % Resolving information for `vmlinux' vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. vtune: Executing actions 44 % Resolving information for `vmlinux' vtune: Executing actions 44 % Resolving bottom user stack information vtune: Executing actions 45 % Resolving bottom user stack information vtune: Executing actions 46 % Resolving bottom user stack information vtune: Executing actions 46 % Resolving thread name information vtune: Executing actions 47 % Resolving thread name information vtune: Executing actions 48 % Resolving thread name information vtune: Executing actions 48 % Resolving call target names for dynamic code vtune: Executing actions 49 % Resolving call target names for dynamic code vtune: Executing actions 49 % Resolving interrupt name information vtune: Executing actions 53 % Resolving interrupt name information vtune: Executing actions 53 % Processing profile metrics and debug information vtune: Executing actions 54 % Processing profile metrics and debug information vtune: Executing actions 56 % Processing profile metrics and debug information vtune: Executing actions 57 % Processing profile metrics and debug information vtune: Executing actions 58 % Processing profile metrics and debug information vtune: Executing actions 60 % Processing profile metrics and debug information vtune: Executing actions 60 % Setting data model parameters vtune: Executing actions 60 % Precomputing frequently used data vtune: Executing actions 60 % Precomputing frequently used data vtune: Executing actions 62 % Precomputing frequently used data vtune: Executing actions 63 % Precomputing frequently used data vtune: Executing actions 64 % Precomputing frequently used data vtune: Executing actions 65 % Precomputing frequently used data vtune: Executing actions 66 % Precomputing frequently used data vtune: Executing actions 67 % Precomputing frequently used data vtune: Executing actions 68 % Precomputing frequently used data vtune: Executing actions 69 % Precomputing frequently used data vtune: Executing actions 71 % Precomputing frequently used data vtune: Executing actions 72 % Precomputing frequently used data vtune: Executing actions 72 % Updating precomputed scalar metrics vtune: Executing actions 75 % Updating precomputed scalar metrics vtune: Executing actions 75 % Discarding redundant overtime data vtune: Executing actions 78 % Discarding redundant overtime data vtune: Executing actions 78 % Saving the result vtune: Executing actions 82 % Saving the result vtune: Executing actions 85 % Saving the result vtune: Executing actions 99 % Saving the result vtune: Executing actions 100 % Saving the result vtune: Executing actions 100 % done Finalization: Ok vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. -------------------------------------------------------------------------------- Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -limit 5 -format csv -csv-delimiter comma -report hotspots -group-by function -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah Stdout: Function,CPU Time,CPU Time:Effective Time,CPU Time:Spin Time,CPU Time:Overhead Time,Instructions Retired,Microarchitecture Usage(%),Microarchitecture Usage:Microarchitecture Usage(%),Microarchitecture Usage:CPI Rate,Module,Function (Full),Source File,Start Address multiply1,59.008306,59.008306,0.0,0.0,59104500000,9.6,9.6,2.792325,matrix,multiply1,multiply.c,0x401550 apic_timer_interrupt,0.040094,0.040094,0.0,0.0,0,4.8,4.8,,vmlinux,apic_timer_interrupt,[Unknown],0xffffffff81790d90 retint_userspace_restore_args,0.025059,0.025059,0.0,0.0,0,0.0,0.0,,vmlinux,retint_userspace_restore_args,[Unknown],0xffffffff817863b3 ktime_get_update_offsets_now,0.020047,0.020047,0.0,0.0,0,0.0,0.0,,vmlinux,ktime_get_update_offsets_now,[Unknown],0xffffffff81108ad0 _raw_spin_lock,0.015035,0.015035,0.0,0.0,42000000,95.2,95.2,0.250000,vmlinux,_raw_spin_lock,[Unknown],0xffffffff81784c20 Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 50 % Finalizing results vtune: Executing actions 50 % Generating a report vtune: Executing actions 50 % Setting data model parameters vtune: Executing actions 75 % Setting data model parameters vtune: Executing actions 75 % Generating a report vtune: Executing actions 100 % Generating a report vtune: Executing actions 100 % done Report: Ok ================================================================================ HW event-based analysis check... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -collect uarch-exploration -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ge -data-limit 0 -finalization-mode none -source-search-dir /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/src -- /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/matrix Stderr: vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. vtune: Error: amplxe-perf: threads_spec: cpu Using CPUID GenuineIntel-6-55-7 path: /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ge/data.0/system-wide.perf dir-data-file: (null) dir-reuse: false Compression enabled, disabling build id collection at the end of the session. nr_threads: 40 thread_masks[0]: 0x562993280f60: maps mask[40]: 0 thread_masks[0]: 0x562993280f70: affinity mask[40]: 0 thread_masks[1]: 0x562993280f80: maps mask[40]: 1 thread_masks[1]: 0x562993280f90: affinity mask[40]: 1 thread_masks[2]: 0x562993280fa0: maps mask[40]: 2 thread_masks[2]: 0x562993280fb0: affinity mask[40]: 2 thread_masks[3]: 0x562993280fc0: maps mask[40]: 3 thread_masks[3]: 0x562993280fd0: affinity mask[40]: 3 thread_masks[4]: 0x562993280fe0: maps mask[40]: 4 thread_masks[4]: 0x562993280ff0: affinity mask[40]: 4 thread_masks[5]: 0x562993281000: maps mask[40]: 5 thread_masks[5]: 0x562993281010: affinity mask[40]: 5 thread_masks[6]: 0x562993281020: maps mask[40]: 6 thread_masks[6]: 0x562993281030: affinity mask[40]: 6 thread_masks[7]: 0x562993281040: maps mask[40]: 7 thread_masks[7]: 0x562993281050: affinity mask[40]: 7 thread_masks[8]: 0x562993281060: maps mask[40]: 8 thread_masks[8]: 0x562993281070: affinity mask[40]: 8 thread_masks[9]: 0x562993281080: maps mask[40]: 9 thread_masks[9]: 0x562993281090: affinity mask[40]: 9 thread_masks[10]: 0x5629932810a0: maps mask[40]: 10 thread_masks[10]: 0x5629932810b0: affinity mask[40]: 10 thread_masks[11]: 0x5629932810c0: maps mask[40]: 11 thread_masks[11]: 0x5629932810d0: affinity mask[40]: 11 thread_masks[12]: 0x5629932810e0: maps mask[40]: 12 thread_masks[12]: 0x5629932810f0: affinity mask[40]: 12 thread_masks[13]: 0x562993281100: maps mask[40]: 13 thread_masks[13]: 0x562993281110: affinity mask[40]: 13 thread_masks[14]: 0x562993281120: maps mask[40]: 14 thread_masks[14]: 0x562993281130: affinity mask[40]: 14 thread_masks[15]: 0x562993281140: maps mask[40]: 15 thread_masks[15]: 0x562993281150: affinity mask[40]: 15 thread_masks[16]: 0x562993281160: maps mask[40]: 16 thread_masks[16]: 0x562993281170: affinity mask[40]: 16 thread_masks[17]: 0x562993281180: maps mask[40]: 17 thread_masks[17]: 0x562993281190: affinity mask[40]: 17 thread_masks[18]: 0x5629932811a0: maps mask[40]: 18 thread_masks[18]: 0x5629932811b0: affinity mask[40]: 18 thread_masks[19]: 0x5629932811c0: maps mask[40]: 19 thread_masks[19]: 0x5629932811d0: affinity mask[40]: 19 thread_masks[20]: 0x5629932811e0: maps mask[40]: 20 thread_masks[20]: 0x5629932811f0: affinity mask[40]: 20 thread_masks[21]: 0x562993281200: maps mask[40]: 21 thread_masks[21]: 0x562993281210: affinity mask[40]: 21 thread_masks[22]: 0x562993281220: maps mask[40]: 22 thread_masks[22]: 0x562993281230: affinity mask[40]: 22 thread_masks[23]: 0x562993281240: maps mask[40]: 23 thread_masks[23]: 0x562993281250: affinity mask[40]: 23 thread_masks[24]: 0x562993281260: maps mask[40]: 24 thread_masks[24]: 0x562993281270: affinity mask[40]: 24 thread_masks[25]: 0x562993281280: maps mask[40]: 25 thread_masks[25]: 0x562993281290: affinity mask[40]: 25 thread_masks[26]: 0x5629932812a0: maps mask[40]: 26 thread_masks[26]: 0x5629932812b0: affinity mask[40]: 26 thread_masks[27]: 0x5629932812c0: maps mask[40]: 27 thread_masks[27]: 0x5629932812d0: affinity mask[40]: 27 thread_masks[28]: 0x5629932812e0: maps mask[40]: 28 thread_masks[28]: 0x5629932812f0: affinity mask[40]: 28 thread_masks[29]: 0x562993281300: maps mask[40]: 29 thread_masks[29]: 0x562993281310: affinity mask[40]: 29 thread_masks[30]: 0x562993281320: maps mask[40]: 30 thread_masks[30]: 0x562993281330: affinity mask[40]: 30 thread_masks[31]: 0x562993281340: maps mask[40]: 31 thread_masks[31]: 0x562993281350: affinity mask[40]: 31 thread_masks[32]: 0x562993281360: maps mask[40]: 32 thread_masks[32]: 0x562993281370: affinity mask[40]: 32 thread_masks[33]: 0x562993281380: maps mask[40]: 33 thread_masks[33]: 0x562993281390: affinity mask[40]: 33 thread_masks[34]: 0x5629932813a0: maps mask[40]: 34 thread_masks[34]: 0x5629932813b0: affinity mask[40]: 34 thread_masks[35]: 0x5629932813c0: maps mask[40]: 35 thread_masks[35]: 0x5629932813d0: affinity mask[40]: 35 thread_masks[36]: 0x5629932813e0: maps mask[40]: 36 thread_masks[36]: 0x5629932813f0: affinity mask[40]: 36 thread_masks[37]: 0x562993281400: maps mask[40]: 37 thread_masks[37]: 0x562993281410: affinity mask[40]: 37 thread_masks[38]: 0x562993281420: maps mask[40]: 38 thread_masks[38]: 0x562993281430: affinity mask[40]: 38 thread_masks[39]: 0x562993281440: maps mask[40]: 39 thread_masks[39]: 0x562993281450: affinity mask[40]: 39 nr_cblocks: 0 affinity: SYS mmap flush: 1 comp level: 1 mmap size 528384B Failed to create thread msg communication pipe: Too many open files vtune: Collection failed. vtune: Internal Error Cannot find 'runsa.options' by path: /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ge/config/runsa.options HW event-based analysis check Example of analysis types: Microarchitecture Exploration Collection: Fail vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. vtune: Error: amplxe-perf: threads_spec: cpu Using CPUID GenuineIntel-6-55-7 path: /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ge/data.0/system-wide.perf dir-data-file: (null) dir-reuse: false Compression enabled, disabling build id collection at the end of the session. nr_threads: 40 thread_masks[0]: 0x562993280f60: maps mask[40]: 0 thread_masks[0]: 0x562993280f70: affinity mask[40]: 0 thread_masks[1]: 0x562993280f80: maps mask[40]: 1 thread_masks[1]: 0x562993280f90: affinity mask[40]: 1 thread_masks[2]: 0x562993280fa0: maps mask[40]: 2 thread_masks[2]: 0x562993280fb0: affinity mask[40]: 2 thread_masks[3]: 0x562993280fc0: maps mask[40]: 3 thread_masks[3]: 0x562993280fd0: affinity mask[40]: 3 thread_masks[4]: 0x562993280fe0: maps mask[40]: 4 thread_masks[4]: 0x562993280ff0: affinity mask[40]: 4 thread_masks[5]: 0x562993281000: maps mask[40]: 5 thread_masks[5]: 0x562993281010: affinity mask[40]: 5 thread_masks[6]: 0x562993281020: maps mask[40]: 6 thread_masks[6]: 0x562993281030: affinity mask[40]: 6 thread_masks[7]: 0x562993281040: maps mask[40]: 7 thread_masks[7]: 0x562993281050: affinity mask[40]: 7 thread_masks[8]: 0x562993281060: maps mask[40]: 8 thread_masks[8]: 0x562993281070: affinity mask[40]: 8 thread_masks[9]: 0x562993281080: maps mask[40]: 9 thread_masks[9]: 0x562993281090: affinity mask[40]: 9 thread_masks[10]: 0x5629932810a0: maps mask[40]: 10 thread_masks[10]: 0x5629932810b0: affinity mask[40]: 10 thread_masks[11]: 0x5629932810c0: maps mask[40]: 11 thread_masks[11]: 0x5629932810d0: affinity mask[40]: 11 thread_masks[12]: 0x5629932810e0: maps mask[40]: 12 thread_masks[12]: 0x5629932810f0: affinity mask[40]: 12 thread_masks[13]: 0x562993281100: maps mask[40]: 13 thread_masks[13]: 0x562993281110: affinity mask[40]: 13 thread_masks[14]: 0x562993281120: maps mask[40]: 14 thread_masks[14]: 0x562993281130: affinity mask[40]: 14 thread_masks[15]: 0x562993281140: maps mask[40]: 15 thread_masks[15]: 0x562993281150: affinity mask[40]: 15 thread_masks[16]: 0x562993281160: maps mask[40]: 16 thread_masks[16]: 0x562993281170: affinity mask[40]: 16 thread_masks[17]: 0x562993281180: maps mask[40]: 17 thread_masks[17]: 0x562993281190: affinity mask[40]: 17 thread_masks[18]: 0x5629932811a0: maps mask[40]: 18 thread_masks[18]: 0x5629932811b0: affinity mask[40]: 18 thread_masks[19]: 0x5629932811c0: maps mask[40]: 19 thread_masks[19]: 0x5629932811d0: affinity mask[40]: 19 thread_masks[20]: 0x5629932811e0: maps mask[40]: 20 thread_masks[20]: 0x5629932811f0: affinity mask[40]: 20 thread_masks[21]: 0x562993281200: maps mask[40]: 21 thread_masks[21]: 0x562993281210: affinity mask[40]: 21 thread_masks[22]: 0x562993281220: maps mask[40]: 22 thread_masks[22]: 0x562993281230: affinity mask[40]: 22 thread_masks[23]: 0x562993281240: maps mask[40]: 23 thread_masks[23]: 0x562993281250: affinity mask[40]: 23 thread_masks[24]: 0x562993281260: maps mask[40]: 24 thread_masks[24]: 0x562993281270: affinity mask[40]: 24 thread_masks[25]: 0x562993281280: maps mask[40]: 25 thread_masks[25]: 0x562993281290: affinity mask[40]: 25 thread_masks[26]: 0x5629932812a0: maps mask[40]: 26 thread_masks[26]: 0x5629932812b0: affinity mask[40]: 26 thread_masks[27]: 0x5629932812c0: maps mask[40]: 27 thread_masks[27]: 0x5629932812d0: affinity mask[40]: 27 thread_masks[28]: 0x5629932812e0: maps mask[40]: 28 thread_masks[28]: 0x5629932812f0: affinity mask[40]: 28 thread_masks[29]: 0x562993281300: maps mask[40]: 29 thread_masks[29]: 0x562993281310: affinity mask[40]: 29 thread_masks[30]: 0x562993281320: maps mask[40]: 30 thread_masks[30]: 0x562993281330: affinity mask[40]: 30 thread_masks[31]: 0x562993281340: maps mask[40]: 31 thread_masks[31]: 0x562993281350: affinity mask[40]: 31 thread_masks[32]: 0x562993281360: maps mask[40]: 32 thread_masks[32]: 0x562993281370: affinity mask[40]: 32 thread_masks[33]: 0x562993281380: maps mask[40]: 33 thread_masks[33]: 0x562993281390: affinity mask[40]: 33 thread_masks[34]: 0x5629932813a0: maps mask[40]: 34 thread_masks[34]: 0x5629932813b0: affinity mask[40]: 34 thread_masks[35]: 0x5629932813c0: maps mask[40]: 35 thread_masks[35]: 0x5629932813d0: affinity mask[40]: 35 thread_masks[36]: 0x5629932813e0: maps mask[40]: 36 thread_masks[36]: 0x5629932813f0: affinity mask[40]: 36 thread_masks[37]: 0x562993281400: maps mask[40]: 37 thread_masks[37]: 0x562993281410: affinity mask[40]: 37 thread_masks[38]: 0x562993281420: maps mask[40]: 38 thread_masks[38]: 0x562993281430: affinity mask[40]: 38 thread_masks[39]: 0x562993281440: maps mask[40]: 39 thread_masks[39]: 0x562993281450: affinity mask[40]: 39 nr_cblocks: 0 affinity: SYS mmap flush: 1 comp level: 1 mmap size 528384B Failed to create thread msg communication pipe: Too many open files ================================================================================ HW event-based analysis with uncore events... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -collect memory-access -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ma -data-limit 0 -finalization-mode none -source-search-dir /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/src -- /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/matrix Stdout: Addr of buf1 = 0x7fc410c8a010 Offs of buf1 = 0x7fc410c8a180 Addr of buf2 = 0x7fc40ec89010 Offs of buf2 = 0x7fc40ec891c0 Addr of buf3 = 0x7fc40cc88010 Offs of buf3 = 0x7fc40cc88100 Addr of buf4 = 0x7fc40ac87010 Offs of buf4 = 0x7fc40ac87140 Threads #: 16 Pthreads Matrix size: 2048 Using multiply kernel: multiply1 Execution time = 6.384 seconds Stderr: vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. vtune: Peak bandwidth measurement started. vtune: Peak bandwidth measurement finished. vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ma -command stop. vtune: Collection stopped. vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ma' vtune: Executing actions 0 % vtune: Executing actions 100 % vtune: Executing actions 100 % done HW event-based analysis with uncore events (Perf) Example of analysis types: Memory Access Collection: Ok vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. -------------------------------------------------------------------------------- Running finalization... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -finalize -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ma Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ma' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 0 % Finalizing the result vtune: Executing actions 0 % Clearing the database vtune: Executing actions 14 % Clearing the database vtune: Executing actions 14 % Loading raw data to the database vtune: Executing actions 14 % Loading 'systemcollector-27811-i63.sc' file vtune: Executing actions 25 % Loading 'systemcollector-27811-i63.sc' file vtune: Executing actions 25 % Loading 'system-wide.perf' file vtune: Executing actions 25 % Loading 'system-wide.stat.perf' file vtune: Executing actions 25 % Updating precomputed scalar metrics vtune: Executing actions 28 % Updating precomputed scalar metrics vtune: Executing actions 28 % Processing profile metrics and debug information vtune: Executing actions 39 % Processing profile metrics and debug information vtune: Executing actions 39 % Setting data model parameters vtune: Executing actions 39 % Resolving module symbols vtune: Executing actions 39 % Resolving information for dangling locations vtune: Executing actions 39 % Resolving information for `matrix' vtune: Executing actions 40 % Resolving information for `matrix' vtune: Executing actions 40 % Resolving information for `mmfs26.ko' vtune: Warning: Cannot locate debugging information for file `/lib/modules/3.10.0-1062.52.2.el7.x86_64/weak-updates/mmfs26.ko'. vtune: Executing actions 42 % Resolving information for `mmfs26.ko' vtune: Executing actions 43 % Resolving information for `mmfs26.ko' vtune: Executing actions 43 % Resolving information for `vmlinux' vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. vtune: Executing actions 45 % Resolving information for `vmlinux' vtune: Executing actions 45 % Resolving bottom user stack information vtune: Executing actions 46 % Resolving bottom user stack information vtune: Executing actions 46 % Resolving thread name information vtune: Executing actions 47 % Resolving thread name information vtune: Executing actions 48 % Resolving thread name information vtune: Executing actions 48 % Resolving call target names for dynamic code vtune: Executing actions 50 % Resolving call target names for dynamic code vtune: Executing actions 50 % Resolving interrupt name information vtune: Executing actions 53 % Resolving interrupt name information vtune: Executing actions 53 % Processing profile metrics and debug information vtune: Executing actions 54 % Processing profile metrics and debug information vtune: Executing actions 56 % Processing profile metrics and debug information vtune: Executing actions 57 % Processing profile metrics and debug information vtune: Executing actions 58 % Processing profile metrics and debug information vtune: Executing actions 60 % Processing profile metrics and debug information vtune: Executing actions 62 % Processing profile metrics and debug information vtune: Executing actions 63 % Processing profile metrics and debug information vtune: Executing actions 63 % Preparing output tree vtune: Executing actions 63 % Parsing columns in input tree vtune: Executing actions 64 % Parsing columns in input tree vtune: Executing actions 64 % Creating top-level columns vtune: Executing actions 65 % Creating top-level columns vtune: Executing actions 65 % Creating top-level rows vtune: Executing actions 67 % Creating top-level rows vtune: Executing actions 67 % Preparing output tree vtune: Executing actions 67 % Parsing columns in input tree vtune: Executing actions 67 % Creating top-level columns vtune: Executing actions 69 % Creating top-level columns vtune: Executing actions 69 % Creating top-level rows vtune: Executing actions 70 % Creating top-level rows vtune: Executing actions 70 % Preparing output tree vtune: Executing actions 70 % Parsing columns in input tree vtune: Executing actions 71 % Parsing columns in input tree vtune: Executing actions 71 % Creating top-level columns vtune: Executing actions 72 % Creating top-level columns vtune: Executing actions 72 % Creating top-level rows vtune: Executing actions 74 % Creating top-level rows vtune: Executing actions 74 % Preparing output tree vtune: Executing actions 74 % Parsing columns in input tree vtune: Executing actions 74 % Creating top-level columns vtune: Executing actions 76 % Creating top-level columns vtune: Executing actions 76 % Creating top-level rows vtune: Executing actions 77 % Creating top-level rows vtune: Executing actions 78 % Creating top-level rows vtune: Executing actions 79 % Creating top-level rows vtune: Executing actions 81 % Creating top-level rows vtune: Executing actions 81 % Preparing output tree vtune: Executing actions 81 % Parsing columns in input tree vtune: Executing actions 81 % Creating top-level columns vtune: Executing actions 83 % Creating top-level columns vtune: Executing actions 83 % Creating top-level rows vtune: Executing actions 84 % Creating top-level rows vtune: Executing actions 85 % Creating top-level rows vtune: Executing actions 85 % Preparing output tree vtune: Executing actions 85 % Parsing columns in input tree vtune: Executing actions 85 % Creating top-level columns vtune: Executing actions 86 % Creating top-level columns vtune: Executing actions 86 % Creating top-level rows vtune: Executing actions 88 % Creating top-level rows vtune: Executing actions 88 % Preparing output tree vtune: Executing actions 88 % Parsing columns in input tree vtune: Executing actions 89 % Parsing columns in input tree vtune: Executing actions 89 % Creating top-level columns vtune: Executing actions 90 % Creating top-level columns vtune: Executing actions 90 % Creating top-level rows vtune: Executing actions 91 % Creating top-level rows vtune: Executing actions 92 % Creating top-level rows vtune: Executing actions 92 % Preparing output tree vtune: Executing actions 92 % Parsing columns in input tree vtune: Executing actions 92 % Creating top-level columns vtune: Executing actions 93 % Creating top-level columns vtune: Executing actions 93 % Creating top-level rows vtune: Executing actions 95 % Creating top-level rows vtune: Executing actions 95 % Preparing output tree vtune: Executing actions 95 % Parsing columns in input tree vtune: Executing actions 96 % Parsing columns in input tree vtune: Executing actions 96 % Creating top-level columns vtune: Executing actions 97 % Creating top-level columns vtune: Executing actions 97 % Creating top-level rows vtune: Executing actions 98 % Creating top-level rows vtune: Executing actions 99 % Creating top-level rows vtune: Executing actions 99 % Preparing output tree vtune: Executing actions 99 % Parsing columns in input tree vtune: Executing actions 99 % Creating top-level columns vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Preparing output tree vtune: Executing actions 100 % Parsing columns in input tree vtune: Executing actions 100 % Creating top-level columns vtune: Executing actions 100 % Creating top-level rows vtune: Executing actions 100 % Setting data model parameters vtune: Executing actions 100 % Precomputing frequently used data vtune: Executing actions 100 % Precomputing frequently used data vtune: Executing actions 100 % Updating precomputed scalar metrics vtune: Executing actions 100 % Discarding redundant overtime data vtune: Executing actions 100 % Saving the result vtune: Executing actions 100 % done Finalization: Ok vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. -------------------------------------------------------------------------------- Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -limit 5 -format csv -csv-delimiter comma -report hotspots -group-by function -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ma Stdout: Function,CPU Time,Memory Bound(%),Memory Bound:L1 Bound(%),Memory Bound:L2 Bound(%),Memory Bound:L3 Bound(%),Memory Bound:DRAM Bound(%),Memory Bound:Store Bound(%),Loads,Stores,LLC Miss Count,LLC Miss Count:Local DRAM Access Count,LLC Miss Count:Remote DRAM Access Count,LLC Miss Count:Remote Cache Access Count,Average Latency (cycles),Module,Function (Full),Source File,Start Address multiply1,83.651061,83.6,0.0,0.0,37.8,42.7,0.0,16637266506,11709249396,1499369230,945242004,288397487,0,191.666010,matrix,multiply1,multiply.c,0x401550 apic_timer_interrupt,0.035082,78.3,87.5,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0.0,vmlinux,apic_timer_interrupt,[Unknown],0xffffffff81790d90 hrtimer_active,0.030070,0.0,0.0,100.0,100.0,0.0,0.0,0,0,0,0,0,0,6.000000,vmlinux,hrtimer_active,[Unknown],0xffffffff810c9be0 ktime_get_update_offsets_now,0.030070,0.0,0.0,0.0,100.0,0.0,0.0,0,0,0,0,0,0,0.0,vmlinux,ktime_get_update_offsets_now,[Unknown],0xffffffff81108ad0 clear_page_c_e,0.020047,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0.0,vmlinux,clear_page_c_e,[Unknown],0xffffffff813906d0 Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ma' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 50 % Finalizing results vtune: Executing actions 50 % Generating a report vtune: Executing actions 50 % Setting data model parameters vtune: Executing actions 75 % Setting data model parameters vtune: Executing actions 75 % Generating a report vtune: Executing actions 100 % Generating a report vtune: Executing actions 100 % done Report: Ok ================================================================================ HW event-based analysis with stacks... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -collect hotspots -knob sampling-mode=hw -knob enable-stack-collection=true -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah_with_stacks -data-limit 0 -finalization-mode none -source-search-dir /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/src -- /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/matrix Stdout: Addr of buf1 = 0x7f8be0161010 Offs of buf1 = 0x7f8be0161180 Addr of buf2 = 0x7f8bde160010 Offs of buf2 = 0x7f8bde1601c0 Addr of buf3 = 0x7f8bdc15f010 Offs of buf3 = 0x7f8bdc15f100 Addr of buf4 = 0x7f8bda15e010 Offs of buf4 = 0x7f8bda15e140 Threads #: 16 Pthreads Matrix size: 2048 Using multiply kernel: multiply1 Execution time = 5.843 seconds Stderr: vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah_with_stacks -command stop. vtune: Collection stopped. vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah_with_stacks' vtune: Executing actions 0 % vtune: Executing actions 100 % vtune: Executing actions 100 % done HW event-based analysis with stacks (Perf) Example of analysis types: Hotspots with HW event-based sampling and call stacks Collection: Ok vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. -------------------------------------------------------------------------------- Running finalization... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -finalize -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah_with_stacks Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah_with_stacks' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 0 % Finalizing the result vtune: Executing actions 0 % Clearing the database vtune: Executing actions 14 % Clearing the database vtune: Executing actions 14 % Loading raw data to the database vtune: Executing actions 14 % Loading 'systemcollector-28073-i63.sc' file vtune: Executing actions 25 % Loading 'systemcollector-28073-i63.sc' file vtune: Executing actions 25 % Loading 'system-wide.perf' file vtune: Executing actions 25 % Updating precomputed scalar metrics vtune: Executing actions 28 % Updating precomputed scalar metrics vtune: Executing actions 28 % Processing profile metrics and debug information vtune: Executing actions 39 % Processing profile metrics and debug information vtune: Executing actions 39 % Setting data model parameters vtune: Executing actions 39 % Resolving module symbols vtune: Executing actions 39 % Resolving information for dangling locations vtune: Executing actions 39 % Resolving information for `libpthread-2.17.so' vtune: Warning: Cannot locate debugging information for file `/usr/lib64/libpthread-2.17.so'. vtune: Executing actions 39 % Resolving information for `matrix' vtune: Executing actions 39 % Resolving information for `libc-2.17.so' vtune: Executing actions 39 % Resolving information for `mlx5_core.ko' vtune: Executing actions 40 % Resolving information for `mlx5_core.ko' vtune: Warning: Cannot locate debugging information for file `/lib/modules/3.10.0-1062.52.2.el7.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko'. vtune: Warning: Cannot locate debugging information for file `/usr/lib64/libc-2.17.so'. vtune: Executing actions 41 % Resolving information for `mlx5_core.ko' vtune: Executing actions 42 % Resolving information for `mlx5_core.ko' vtune: Executing actions 43 % Resolving information for `mlx5_core.ko' vtune: Warning: Cannot locate file `nvidia.ko'. vtune: Executing actions 43 % Resolving information for `nvidia' vtune: Executing actions 44 % Resolving information for `nvidia' vtune: Executing actions 44 % Resolving information for `vmlinux' vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. vtune: Executing actions 45 % Resolving information for `vmlinux' vtune: Executing actions 45 % Resolving bottom user stack information vtune: Executing actions 46 % Resolving bottom user stack information vtune: Executing actions 47 % Resolving bottom user stack information vtune: Executing actions 47 % Resolving thread name information vtune: Executing actions 48 % Resolving thread name information vtune: Executing actions 48 % Resolving call target names for dynamic code vtune: Executing actions 50 % Resolving call target names for dynamic code vtune: Executing actions 50 % Resolving interrupt name information vtune: Executing actions 53 % Resolving interrupt name information vtune: Executing actions 53 % Processing profile metrics and debug information vtune: Executing actions 54 % Processing profile metrics and debug information vtune: Executing actions 56 % Processing profile metrics and debug information vtune: Executing actions 57 % Processing profile metrics and debug information vtune: Executing actions 58 % Processing profile metrics and debug information vtune: Executing actions 60 % Processing profile metrics and debug information vtune: Executing actions 60 % Setting data model parameters vtune: Executing actions 60 % Precomputing frequently used data vtune: Executing actions 60 % Precomputing frequently used data vtune: Executing actions 62 % Precomputing frequently used data vtune: Executing actions 63 % Precomputing frequently used data vtune: Executing actions 64 % Precomputing frequently used data vtune: Executing actions 65 % Precomputing frequently used data vtune: Executing actions 66 % Precomputing frequently used data vtune: Executing actions 67 % Precomputing frequently used data vtune: Executing actions 68 % Precomputing frequently used data vtune: Executing actions 69 % Precomputing frequently used data vtune: Executing actions 71 % Precomputing frequently used data vtune: Executing actions 72 % Precomputing frequently used data vtune: Executing actions 72 % Updating precomputed scalar metrics vtune: Executing actions 75 % Updating precomputed scalar metrics vtune: Executing actions 75 % Discarding redundant overtime data vtune: Executing actions 78 % Discarding redundant overtime data vtune: Executing actions 78 % Saving the result vtune: Executing actions 82 % Saving the result vtune: Executing actions 85 % Saving the result vtune: Executing actions 100 % Saving the result vtune: Executing actions 100 % done Finalization: Ok vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. -------------------------------------------------------------------------------- Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -limit 5 -format csv -csv-delimiter comma -report hotspots -group-by function -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah_with_stacks Stdout: Function,CPU Time,CPU Time:Effective Time,CPU Time:Spin Time,CPU Time:Overhead Time,Instructions Retired,Microarchitecture Usage(%),Microarchitecture Usage:Microarchitecture Usage(%),Microarchitecture Usage:CPI Rate,Module,Function (Full),Source File,Start Address multiply1,80.468605,80.468605,0.0,0.0,68806500000,8.2,8.2,3.271631,matrix,multiply1,multiply.c,0x401550 func@0xffffffff81790e91,0.220517,0.220517,0.0,0.0,325500000,18.2,18.2,1.774194,vmlinux,func@0xffffffff81790e91,[Unknown],0xffffffff81790e91 func@0xffffffff81792fa1,0.065153,0.065153,0.0,0.0,10500000,,,0.0,vmlinux,func@0xffffffff81792fa1,[Unknown],0xffffffff81792fa1 retint_userspace_restore_args,0.055129,0.055129,0.0,0.0,10500000,7.9,7.9,6.000000,vmlinux,retint_userspace_restore_args,[Unknown],0xffffffff817863b3 apic_timer_interrupt,0.040094,0.040094,0.0,0.0,0,8.7,8.7,,vmlinux,apic_timer_interrupt,[Unknown],0xffffffff81790d90 Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_ah_with_stacks' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 50 % Finalizing results vtune: Executing actions 50 % Generating a report vtune: Executing actions 50 % Setting data model parameters vtune: Executing actions 75 % Setting data model parameters vtune: Executing actions 75 % Generating a report vtune: Executing actions 100 % Generating a report vtune: Executing actions 100 % done Report: Ok ================================================================================ HW event-based analysis with context switches... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -collect threading -knob sampling-and-waits=hw -knob enable-stack-collection=false -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_th -data-limit 0 -finalization-mode none -source-search-dir /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/src -- /work/scitas-ge/orliac/intel/vtune/2022.3.0/samples/en/C++/matrix/matrix Stdout: Addr of buf1 = 0x7f7d17e14010 Offs of buf1 = 0x7f7d17e14180 Addr of buf2 = 0x7f7d15e13010 Offs of buf2 = 0x7f7d15e131c0 Addr of buf3 = 0x7f7d13e12010 Offs of buf3 = 0x7f7d13e12100 Addr of buf4 = 0x7f7d11e11010 Offs of buf4 = 0x7f7d11e11140 Threads #: 16 Pthreads Matrix size: 2048 Using multiply kernel: multiply1 Execution time = 5.196 seconds Stderr: vtune: Warning: Context switch data cannot be collected using the Perf-based driverless collection if the kernel version is less than 4.3. Consider loading the VTune Profiler sampling driver using the root credentials. vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_th -command stop. vtune: Collection stopped. vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_th' vtune: Executing actions 0 % vtune: Executing actions 100 % vtune: Executing actions 100 % done HW event-based analysis with context switches (Perf) Example of analysis types: Threading with HW event-based sampling Collection: Ok vtune: Warning: Context switch data cannot be collected using the Perf-based driverless collection if the kernel version is less than 4.3. Consider loading the VTune Profiler sampling driver using the root credentials. vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. -------------------------------------------------------------------------------- Running finalization... Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -finalize -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_th Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_th' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 0 % Finalizing the result vtune: Executing actions 0 % Clearing the database vtune: Executing actions 14 % Clearing the database vtune: Executing actions 14 % Loading raw data to the database vtune: Executing actions 14 % Loading 'systemcollector-28247-i63.sc' file vtune: Executing actions 25 % Loading 'systemcollector-28247-i63.sc' file vtune: Executing actions 25 % Loading 'system-wide.perf' file vtune: Executing actions 25 % Updating precomputed scalar metrics vtune: Executing actions 28 % Updating precomputed scalar metrics vtune: Executing actions 28 % Processing profile metrics and debug information vtune: Executing actions 39 % Processing profile metrics and debug information vtune: Executing actions 39 % Setting data model parameters vtune: Executing actions 39 % Resolving module symbols vtune: Executing actions 39 % Resolving information for `matrix' vtune: Executing actions 41 % Resolving information for `matrix' vtune: Executing actions 41 % Resolving information for `vmlinux' vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. vtune: Executing actions 44 % Resolving information for `vmlinux' vtune: Executing actions 44 % Resolving bottom user stack information vtune: Executing actions 45 % Resolving bottom user stack information vtune: Executing actions 46 % Resolving bottom user stack information vtune: Executing actions 46 % Resolving thread name information vtune: Executing actions 47 % Resolving thread name information vtune: Executing actions 48 % Resolving thread name information vtune: Executing actions 48 % Resolving call target names for dynamic code vtune: Executing actions 49 % Resolving call target names for dynamic code vtune: Executing actions 49 % Resolving interrupt name information vtune: Executing actions 53 % Resolving interrupt name information vtune: Executing actions 53 % Processing profile metrics and debug information vtune: Executing actions 54 % Processing profile metrics and debug information vtune: Executing actions 56 % Processing profile metrics and debug information vtune: Executing actions 57 % Processing profile metrics and debug information vtune: Executing actions 58 % Processing profile metrics and debug information vtune: Executing actions 60 % Processing profile metrics and debug information vtune: Executing actions 62 % Processing profile metrics and debug information vtune: Executing actions 63 % Processing profile metrics and debug information vtune: Executing actions 63 % Setting data model parameters vtune: Executing actions 64 % Setting data model parameters vtune: Executing actions 64 % Precomputing frequently used data vtune: Executing actions 64 % Precomputing frequently used data vtune: Executing actions 66 % Precomputing frequently used data vtune: Executing actions 67 % Precomputing frequently used data vtune: Executing actions 68 % Precomputing frequently used data vtune: Executing actions 69 % Precomputing frequently used data vtune: Executing actions 70 % Precomputing frequently used data vtune: Executing actions 71 % Precomputing frequently used data vtune: Executing actions 73 % Precomputing frequently used data vtune: Executing actions 74 % Precomputing frequently used data vtune: Executing actions 76 % Precomputing frequently used data vtune: Executing actions 76 % Updating precomputed scalar metrics vtune: Executing actions 78 % Updating precomputed scalar metrics vtune: Executing actions 78 % Discarding redundant overtime data vtune: Executing actions 82 % Discarding redundant overtime data vtune: Executing actions 82 % Saving the result vtune: Executing actions 85 % Saving the result vtune: Executing actions 89 % Saving the result vtune: Executing actions 99 % Saving the result vtune: Executing actions 100 % Saving the result vtune: Executing actions 100 % done Finalization: Ok vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. -------------------------------------------------------------------------------- Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/vtune -limit 5 -format csv -csv-delimiter comma -report hotspots -group-by function -r /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_th Stdout: Function,CPU Time,CPU Time:Effective Time,CPU Time:Spin Time,CPU Time:Overhead Time,Inactive Wait Time,Inactive Wait Time:Inactive Sync Wait Time,Inactive Wait Time:Inactive Sync Wait Time:Idle,Inactive Wait Time:Inactive Sync Wait Time:Poor,Inactive Wait Time:Inactive Sync Wait Time:Ok,Inactive Wait Time:Inactive Sync Wait Time:Ideal,Inactive Wait Time:Inactive Sync Wait Time:Over,Inactive Wait Time:Preemption Wait Time,Inactive Wait Time:Preemption Wait Time:Idle,Inactive Wait Time:Preemption Wait Time:Poor,Inactive Wait Time:Preemption Wait Time:Ok,Inactive Wait Time:Preemption Wait Time:Ideal,Inactive Wait Time:Preemption Wait Time:Over,Inactive Wait Count,Inactive Wait Count:Inactive Sync Wait Count,Inactive Wait Count:Inactive Sync Wait Count:Idle,Inactive Wait Count:Inactive Sync Wait Count:Poor,Inactive Wait Count:Inactive Sync Wait Count:Ok,Inactive Wait Count:Inactive Sync Wait Count:Ideal,Inactive Wait Count:Inactive Sync Wait Count:Over,Inactive Wait Count:Preemption Wait Count,Inactive Wait Count:Preemption Wait Count:Idle,Inactive Wait Count:Preemption Wait Count:Poor,Inactive Wait Count:Preemption Wait Count:Ok,Inactive Wait Count:Preemption Wait Count:Ideal,Inactive Wait Count:Preemption Wait Count:Over,Module,Function (Full),Source File,Start Address multiply1,56.221772,56.221772,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,matrix,multiply1,multiply.c,0x401550 apic_timer_interrupt,0.030070,0.030070,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,vmlinux,apic_timer_interrupt,[Unknown],0xffffffff81790d90 ktime_get_update_offsets_now,0.025059,0.025059,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,vmlinux,ktime_get_update_offsets_now,[Unknown],0xffffffff81108ad0 ktime_get,0.020047,0.020047,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,vmlinux,ktime_get,[Unknown],0xffffffff81107710 retint_userspace_restore_args,0.020047,0.020047,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,vmlinux,retint_userspace_restore_args,[Unknown],0xffffffff817863b3 Stderr: vtune: Using result path `/tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/result_th' vtune: Executing actions 0 % vtune: Executing actions 0 % Finalizing results vtune: Executing actions 50 % Finalizing results vtune: Executing actions 50 % Generating a report vtune: Executing actions 50 % Setting data model parameters vtune: Executing actions 75 % Setting data model parameters vtune: Executing actions 75 % Generating a report vtune: Executing actions 100 % Generating a report vtune: Executing actions 100 % done Report: Ok Getting available devices: Command line: sycl-ls Exception: [Errno 2] No such file or directory: 'sycl-ls': 'sycl-ls' Checking DPC++ application as prerequisite for GPU analyses... Setting envirnoment variable: SYCL_DEVICE_FILTER=opencl:gpu Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/self_check_apps/matrix.dpcpp/matrix.dpcpp Stderr: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/self_check_apps/matrix.dpcpp/matrix.dpcpp: error while loading shared libraries: libsycl.so.5: cannot open shared object file: No such file or directory Setting envirnoment variable: SYCL_DEVICE_FILTER=level_zero:gpu Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/self_check_apps/matrix.dpcpp/matrix.dpcpp Stderr: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/self_check_apps/matrix.dpcpp/matrix.dpcpp: error while loading shared libraries: libsycl.so.5: cannot open shared object file: No such file or directory Setting envirnoment variable: ZES_ENABLE_SYSMAN=1 Command line: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/self_check_apps/matrix.dpcpp/matrix.dpcpp Stderr: /work/scitas-ge/orliac/intel/vtune/2022.3.0/bin64/self_check_apps/matrix.dpcpp/matrix.dpcpp: error while loading shared libraries: libsycl.so.5: cannot open shared object file: No such file or directory Checking DPC++ application as prerequisite for GPU analyses: Fail Unable to run DPC++ application on GPU connected to this system. If you are using an Intel GPU and want to verify profiling support for DPC++ applications, check these requirements: * Install Intel(R) GPU driver. * Install Intel(R) Level Zero GPU runtime. * Install Intel(R) oneAPI DPC++ Runtime and set the environment. The check observed a product failure on your system. Review errors in the output above to fix a problem or contact Intel technical support. The system is ready for the following analyses: * Performance Snapshot * Hotspots and Threading with user-mode sampling * Hotspots with HW event-based sampling, HPC Performance Characterization, etc. * Memory Access * Hotspots with HW event-based sampling and call stacks * Threading with HW event-based sampling The following analyses have failed on the system: * Microarchitecture Exploration * GPU Compute/Media Hotspots (characterization mode) * GPU Compute/Media Hotspots (source analysis mode) Log location: /tmp/vtune-tmp-orliac/self-checker-2022.09.12_15.27.16/log.txt