Analyzers
Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
4750 Discussions

V Tune not showing P-core untilisation on Microarchitecture Exploration

oliver_h
Beginner
1,822 Views

For some reason the P-Core data is not showing on my 12900k on the matrix example.

I'm using V-Tune straight "out of the box" just running the example program as shown. However, in Threading it does show that all 16 (physical) cores are being being targeted as well as saying that on the console when using the example.

I have used V-Tune on my laptop which is also alderlake and cannot replicate the error.

Could there be a security feature stopping the drivers from being able to see the cpu event counters soemwhere?

 

oliver_h_0-1665078528591.png

0 Kudos
28 Replies
JaideepK_Intel
Moderator
1,539 Views

Hi,

 

Thank you for posting in the Intel Communities.

 

We are sorry to say that we are unable to reproduce your issue from our end on the Alder Lake machine. For further investigation, we need the below details.

 

Could you please answer the following questions:

1. Are you trying to run your application inside of a VM?

2. Please share the operating system details (Windows 10, Ubuntu 18.04, Centos8, etc.) and processor details.

3. Try with a different sample and let us know if you're facing a similar issue. (Important step)

 

Self-checker: (windows)

1. Run command prompt as administrator

2. To set the environment variables run the below command:

<Vtune_installation_directory\2022.2.0\vtune-vars.bat>

example: C:\Program Files (x86)\Intel\oneAPI\vtune\2022.2.0\vtune-vars.bat

 

3. To run vtune-self-checker run the below command:

<Vtune_installation_directory\2022.2.0\bin64\vtune-self-checker.bat>

example: C:\Program Files (x86)\Intel\oneAPI\vtune\2022.3.0\bin64\vtune-self-checker.bat

 

Self-checker:(linux)

1. Run commands in the terminal as superuser/root user.

2. Set the environment variables.

 source /opt/intel/oneapi/setvars.sh

3. Run vtune-self-checker run the below command:

sh /opt/intel/oneapi/vtune/latest/bin64/vtune-self-checker.sh

 

Please attach screenshots and self-checker logs.

 

Thanks,

Jaideep

 

 

 

 

oliver_h
Beginner
1,508 Views

Hi Jaideep,

I can confirm that my machine is not runnign in a virtual machine.

I have tried from another application I've designed and i still face the same issues. Below I have added a screenshot fromk within VS2022 and as you can see it's the same.

oliver_h_1-1665590897153.png

I have ran the self checker and it all seems to be okay, I have attached a screenshot below of the results.

oliver_h_0-1665590732294.png

As a note, I did have this issue many months ago but after reinstalling Windows it went away and reported fine on both P and E cores. Now however it has come back, and after reattempting to reinstall windows the issue has persisted. None of my other Alderlake machines experiance the same issues as this however. Here is a screenshot of my system info also:

oliver_h_2-1665591097070.png

 

Oliver

 

Dom324
Beginner
1,328 Views

I have exactly the same issue! And looks like this post is also the same issue

Dom324_1-1665838241094.png

My system:
CPU: i5 12400
OS: RHEL8

When looking at "Event Count" tab, it seems that most counters are present/correctly incremented, but some counters like "Clockticks" are stuck at 0. I'm including an image of the Event Count tab, and also I'm including a CSV file with Event counts.

Dom324_0-1665837766209.png



When running the demo matrix example from command line:

 

[kevin@localhost bin64]$ ./vtune -collect uarch-exploration -r '/home/kevin/Documents/vtune_test'  /home/kevin/intel/vtune/samples/matrix/matrix
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /home/kevin/Documents/vtune_test -command stop.
Addr of buf1 = 0x7fe3f4f61010
Offs of buf1 = 0x7fe3f4f61180
Addr of buf2 = 0x7fe3f2f60010
Offs of buf2 = 0x7fe3f2f601c0
Addr of buf3 = 0x7fe3f0f5f010
Offs of buf3 = 0x7fe3f0f5f100
Addr of buf4 = 0x7fe3eef5e010
Offs of buf4 = 0x7fe3eef5e140
Threads #: 8 Pthreads
Matrix size: 2048
Using multiply kernel: multiply1
Execution time = 15.122 seconds
vtune: Collection stopped.
vtune: Using result path `/home/kevin/Documents/vtune_test'
vtune: Executing actions 19 % Resolving information for `ld-2.28.so'           
vtune: Warning: Cannot locate file `nft_counter.ko'.
vtune: Executing actions 19 % Resolving information for `nft_counter'          
vtune: Warning: Cannot locate file `bridge.ko'.
vtune: Executing actions 19 % Resolving information for `matrix'               
vtune: Warning: Cannot locate file `r8169.ko'.
vtune: Executing actions 19 % Resolving information for `r8169'                
vtune: Warning: Cannot locate file `nf_tables.ko'.
vtune: Executing actions 19 % Resolving information for `nf_tables'            
vtune: Warning: Cannot locate file `nf_conntrack.ko'.
vtune: Executing actions 21 % Resolving information for `libc-2.28.so'         
vtune: Warning: Cannot locate file `nft_ct.ko'.
vtune: Executing actions 21 % Resolving information for `nft_ct'               
vtune: Warning: Cannot locate debugging information for file `/usr/lib64/libc-2.28.so'.
vtune: Warning: Cannot locate file `drm.ko'.
vtune: Executing actions 21 % Resolving information for `drm'                  
vtune: Warning: Cannot locate file `kvm.ko'.
vtune: Executing actions 22 % Resolving information for `kvm'                  
vtune: Warning: Cannot locate file `xfs.ko'.
vtune: Executing actions 22 % Resolving information for `xfs'                  
vtune: Warning: Cannot locate file `i915.ko'.
vtune: Executing actions 22 % Resolving information for `i915'                 
vtune: Warning: Cannot locate file `sep5.ko'.
vtune: Executing actions 22 % Resolving information for `vmlinux'              
vtune: Warning: Function and source-level analysis for the Linux kernel will not be possible since neither debug version of the kernel nor kernel symbol tables are found. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.
vtune: Executing actions 75 % Generating a report                              Elapsed Time: 15.135s
    Clockticks: 0
        P-Core: 0
        E-Core: 0
    Instructions Retired: 69,798,000,000
        P-Core: 0
        E-Core: 0
    CPI Rate: 0.000
        P-Core: 0.000
        E-Core: 0.000
    MUX Reliability: 0.000
     | Precision of collected HW event data is not enough. Metrics data may be
     | unreliable. Consider increasing your application execution time, using
     | the multiple runs mode instead of event multiplexing, or creating a
     | custom analysis with a limited subset of HW events. If you are using a
     | driverless collection, consider reducing the value of
     | /sys/bus/event_source/devices/cpu/perf_event_mux_interval_ms file.
     |
    P-Core
        Retiring: 0.0% of Pipeline Slots
            Light Operations: 0.0% of Pipeline Slots
                FP Arithmetic: 0.0% of uOps
                    FP x87: 0.0% of uOps
                    FP Scalar: 0.0% of uOps
                    FP Vector: 0.0% of uOps
                Memory Operations: 0.0% of Pipeline Slots
                Fused Instructions: 0.0% of Pipeline Slots
                Non Fused Branches: 0.0% of Pipeline Slots
                Nop Instructions: 0.0% of Pipeline Slots
                Other: 0.0% of Pipeline Slots
            Heavy Operations: 0.0% of Pipeline Slots
                Microcode Sequencer: 0.0% of Pipeline Slots
                    Assists: 0.0% of Pipeline Slots
                    CISC: 0.0% of Pipeline Slots
        Front-End Bound: 0.0% of Pipeline Slots
            Front-End Latency: 0.0% of Pipeline Slots
                ICache Misses: 0.0% of Clockticks
                ITLB Overhead: 0.0% of Clockticks
                Branch Resteers: 0.0% of Clockticks
                    Mispredicts Resteers: 0.0% of Clockticks
                    Clears Resteers: 0.0% of Clockticks
                    Unknown Branches: 0.0% of Clockticks
                DSB Switches: 0.0% of Clockticks
                Length Changing Prefixes: 0.0% of Clockticks
                MS Switches: 0.0% of Clockticks
            Front-End Bandwidth: 0.0% of Pipeline Slots
                Front-End Bandwidth MITE: 0.0% of Pipeline Slots
                    Decoder-0 Alone: 0.0% of Clockticks
                Front-End Bandwidth DSB: 0.0% of Pipeline Slots
                Front-End Bandwidth LSD: 0.0% of Pipeline Slots
                (Info) DSB Coverage: 0.0%
                (Info) LSD Coverage: 0.0%
                (Info) DSB Misses Cost: 0.0% of Pipeline Slots
        Bad Speculation: 100.0% of Pipeline Slots
         | A significant proportion of pipeline slots containing useful work are
         | being cancelled. This can be caused by mispredicting branches or by
         | machine clears. Note that this metric value may be highlighted due to
         | Branch Resteers issue.
         |
            Branch Mispredict: 0.0% of Pipeline Slots
            Machine Clears: 100.0% of Pipeline Slots
             | Issue: A significant portion of execution time is spent handling
             | machine clears.
             | 
             | Tips: See the "Memory Disambiguation" section in the Intel 64 and
             | IA-32 Architectures Optimization Reference Manual.
             |
        Back-End Bound: 0.0% of Pipeline Slots
            Memory Bound: 0.0% of Pipeline Slots
                L1 Bound: 0.0% of Clockticks
                    DTLB Overhead: 0.0% of Clockticks
                        Load STLB Hit: 0.0% of Clockticks
                        Load STLB Miss: 0.0% of Clockticks
                    Loads Blocked by Store Forwarding: 0.0% of Clockticks
                    Lock Latency: 0.0% of Clockticks
                    Split Loads: 0.0% of Clockticks
                    FB Full: 0.0% of Clockticks
                L2 Bound: 0.0% of Clockticks
                L3 Bound: 0.0% of Clockticks
                    Contested Accesses: 0.0% of Clockticks
                    Data Sharing: 0.0% of Clockticks
                    L3 Latency: 0.0% of Clockticks
                    SQ Full: 0.0% of Clockticks
                DRAM Bound: 0.0% of Clockticks
                    Memory Bandwidth: 0.0% of Clockticks
                    Memory Latency: 0.0% of Clockticks
                Store Bound: 0.0% of Clockticks
                    Store Latency: 0.0% of Clockticks
                    False Sharing: 0.0% of Clockticks
                    Split Stores: 0.0% of Clockticks
                    Streaming Stores: 0.0% of Clockticks
                    DTLB Store Overhead: 0.0% of Clockticks
                        Store STLB Hit: 0.0% of Clockticks
                        Store STLB Hit: 0.0% of Clockticks
            Core Bound: 0.0% of Pipeline Slots
                Divider: 0.0% of Clockticks
                Port Utilization: 0.0% of Clockticks
                    Cycles of 0 Ports Utilized: 0.0% of Clockticks
                        Serializing Operations: 0.0% of Clockticks
                            Slow Pause: 0.0% of Clockticks
                            Memory Fence: 0.0% of Clockticks
                        Mixing Vectors: 0.0% of Clockticks
                    Cycles of 1 Port Utilized: 0.0% of Clockticks
                    Cycles of 2 Ports Utilized: 0.0% of Clockticks
                    Cycles of 3+ Ports Utilized: 0.0% of Clockticks
                        ALU Operation Utilization: 0.0% of Clockticks
                            Port 0: 0.0% of Clockticks
                            Port 1: 0.0% of Clockticks
                            Port 6: 0.0% of Clockticks
                        Load Operation Utilization: 0.0% of Clockticks
                        Store Operation Utilization: 0.0% of Clockticks
    E-Core
        Retiring: 0.0% of Pipeline Slots
            General Retirement: 0.0% of Pipeline Slots
                FP Arithmetic: 0.0% of Pipeline Slots
                Other: 0.0% of Pipeline Slots
            Microcode Sequencer: 0.0% of Pipeline Slots
        Front-End Bound: 0.0% of Pipeline Slots
            Front-End Latency: 0.0% of Pipeline Slots
                ICache Misses: 0.0% of Pipeline Slots
                ITLB Overhead: 0.0% of Pipeline Slots
                BACLEARS: 0.0% of Pipeline Slots
                Branch Resteers: 0.0% of Pipeline Slots
            Front-End Bandwidth: 0.0% of Pipeline Slots
                Cisc: 0.0% of Pipeline Slots
                Decode: 0.0% of Pipeline Slots
                Pre-Decode Wrong: 0.0% of Pipeline Slots
                Front-End Other: 0.0% of Pipeline Slots
        Bad Speculation: 0.0% of Pipeline Slots
            Branch Mispredict: 0.0% of Pipeline Slots
            Machine Clears: 0.0% of Pipeline Slots
                Machine Clear: 0.0% of Pipeline Slots
                MO Machine Clear Overhead: 0.0% of Pipeline Slots
        Back-End Bound: 0.0% of Pipeline Slots
            Resource Bound: 0.0% of Pipeline Slots
                Memory Scheduler: 0.0% of Pipeline Slots
                Non-memory Scheduler: 0.0% of Pipeline Slots
                Register: 0.0% of Pipeline Slots
                Full Re-order Buffer (ROB): 0.0% of Pipeline Slots
                Allocation Restriction: 0.0% of Pipeline Slots
                Serializing Operations: 0.0% of Pipeline Slots
        Alternative Back-End Bound: 0.0% of Pipeline Slots
            Core Bound: 0.0%
            Memory Bound: 0.0%
                L2 Bound: 0.0%
                L3 Bound: 0.0%
                DRAM Bound: 0.0%
    Average CPU Frequency: 0.000 MHz
    Total Thread Count: 9
    Paused Time: 0s
Effective Physical Core Utilization: 0.0% (0.000 out of 6)
 | The metric value is low, which may signal a poor physical CPU cores
 | utilization caused by:
 |     - load imbalance
 |     - threading runtime overhead
 |     - contended synchronization
 |     - thread/process underutilization
 |     - incorrect affinity that utilizes logical cores instead of physical
 |       cores
 | Explore sub-metrics to estimate the efficiency of MPI and OpenMP parallelism
 | or run the Locks and Waits analysis to identify parallel bottlenecks for
 | other parallel runtimes.
 |
    Effective Logical Core Utilization: 63.8% (7.653 out of 12)
     | The metric value is low, which may signal a poor logical CPU cores
     | utilization. Consider improving physical core utilization as the first
     | step and then look at opportunities to utilize logical cores, which in
     | some cases can improve processor throughput and overall performance of
     | multi-threaded applications.
     |
Collection and Platform Info
    Application Command Line: /home/kevin/intel/vtune/samples/matrix/matrix 
    User Name: kevin
    Operating System: 4.18.0-372.19.1.el8_6.x86_64 Red Hat Enterprise Linux release 8.6 (Ootpa)
    Computer Name: localhost.localdomain
    Result Size: 305.1 MB 
    Collection start time: 12:13:29 15/10/2022 UTC
    Collection stop time: 12:13:44 15/10/2022 UTC
    Collector Type: Event-based sampling driver
    CPU
        Name: Intel(R) microarchitecture code named Alderlake-S
        Frequency: 2.995 GHz
        Logical CPU Count: 12
        Cache Allocation Technology
            Level 2 capability: available
            Level 3 capability: not detected

If you want to skip descriptions of detected performance issues in the report,
enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
Alternatively, you may view the report in the csv format: vtune -report
<report_name> -format=csv.
vtune: Executing actions 100 % done

 



And this is the output of self_checker:

 

[kevin@localhost bin64]$ sudo ./vtune-self-checker.sh
[sudo] password for kevin: 
Intel(R) VTune(TM) Profiler Self Check Utility
Copyright (C) 2009 Intel Corporation. All rights reserved.
Build Number: 624050

HW event-based analysis (counting mode) (Intel driver)   
Example of analysis types: Performance Snapshot
    Collection: Ok
vtune: Warning: Cannot collect GPU hardware metrics because libmd.so was not found. Make sure you have installed Metrics Discovery API library from https://github.com/intel/metrics-discovery. See Error Message: Cannot Collect GPU Hardware Metrics help topic for more details.
    Finalization: Ok...
vtune: Warning: Cannot collect GPU hardware metrics because libmd.so was not found. Make sure you have installed Metrics Discovery API library from https://github.com/intel/metrics-discovery. See Error Message: Cannot Collect GPU Hardware Metrics help topic for more details.

    Report: Ok

Instrumentation based analysis check   
Example of analysis types: Hotspots and Threading with user-mode sampling
    Collection: Ok
    Finalization: Ok...
    Report: Ok

HW event-based analysis check (Intel driver)   
Example of analysis types: Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis check (Intel driver)   
Example of analysis types: Microarchitecture Exploration
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis with uncore events (Intel driver)   
Example of analysis types: Memory Access
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis with stacks (Intel driver)   
Example of analysis types: Hotspots with HW event-based sampling and call stacks
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis with context switches (Intel driver)   
Example of analysis types: Threading with HW event-based sampling
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

Checking DPC++ application as prerequisite for GPU analyses: Fail
Unable to run DPC++ application on GPU connected to this system. If you are using an Intel GPU and want to verify profiling support for DPC++ applications, check these requirements:
* Install Intel(R) GPU driver.
* Install Intel(R) Level Zero GPU runtime.
* Install Intel(R) oneAPI DPC++ Runtime and set the environment.

The system is ready to be used for performance analysis with Intel VTune Profiler.
Review warnings in the output above to find product limitations, if any.

The system is ready for the following analyses:
* Performance Snapshot
* Hotspots and Threading with user-mode sampling
* Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
* Microarchitecture Exploration
* Memory Access
* Hotspots with HW event-based sampling and call stacks
* Threading with HW event-based sampling

The following analyses have failed on the system:
* GPU Compute/Media Hotspots (characterization mode)
* GPU Compute/Media Hotspots (source analysis mode)

Log location: /tmp/vtune-tmp-root/self-checker-2022.10.15_14.27.41/log.txt

 


Maximillia_D_Intel
1,333 Views

Hi Jaideep,

 

  I'm seeing the same issue. Please find answers to the questions you ask below.

 

Any help or next steps are appreciated.

 

Thanks, 

Max

 

I'm not running in a VM.

I have tried three different applications and see the same behavior in each. Task manager/Resource Monitors shows all 16 cores being utilized. Vtune suggests only the E-cores are active. 

One other tidbit - In the Collection and Platform info - the report claims I'm running Windows 10, but as you can see from the copy and paste from windows device info, I'm running Windows 11.

 

Maximillia_D_Intel_0-1666024287668.pngMaximillia_D_Intel_1-1666024362735.png

 

 

Device name     xxx

Processor           12th Gen Intel(R) Core(TM) i7-1270P   2.20 GHz

Installed RAM    16.0 GB (15.6 GB usable)

System type       64-bit operating system, x64-based processor

Pen and touch   No pen or touch input is available for this display

 

Edition  Windows 11 Pro

Version 21H2

Installed on        ‎10/‎6/‎2022

OS build              22000.1098

Experience         Windows Feature Experience Pack 1000.22000.1098.0

 

C:\Windows\System32>"C:\Program Files (x86)\Intel\oneAPI\vtune\2022.4.0\bin64\vtune-self-checker.bat"

Intel(R) VTune(TM) Profiler Self Check Utility

Copyright (C) 2009 Intel Corporation. All rights reserved.

Build Number: 624343

 

HW event-based analysis (counting mode) (Intel driver)

Example of analysis types: Performance Snapshot

    Collection: Ok

    Finalization: Ok...

    Report: Ok

 

Instrumentation based analysis check

Example of analysis types: Hotspots and Threading with user-mode sampling

    Collection: Ok

    Finalization: Ok...

    Report: Ok

 

HW event-based analysis check (Intel driver)

Example of analysis types: Hotspots with HW event-based sampling, HPC Performance Characterization, etc.

    Collection: Ok

    Finalization: Ok...

    Report: Ok

 

HW event-based analysis check (Intel driver)

Example of analysis types: Microarchitecture Exploration

    Collection: Ok

    Finalization: Ok...

    Report: Ok

 

HW event-based analysis with uncore events (Intel driver)

Example of analysis types: Memory Access

    Collection: Ok

    Finalization: Ok...

vtune: Warning: The result contains a lot of raw data. Finalization may take a long time to complete.

 

    Report: Ok

 

HW event-based analysis with stacks (Intel driver)

Example of analysis types: Hotspots with HW event-based sampling and call stacks

    Collection: Ok

    Finalization: Ok...

    Report: Ok

 

HW event-based analysis with context switches (Intel driver)

Example of analysis types: Threading with HW event-based sampling

    Collection: Ok

    Finalization: Ok...

    Report: Ok

 

Checking DPC++ application as prerequisite for GPU analyses: Fail

Unable to run DPC++ application on GPU connected to this system. If you are using an Intel GPU and want to verify profiling support for DPC++ applications, check these requirements:

* Install Intel(R) GPU driver.

* Install Intel(R) Level Zero GPU runtime.

* Install Intel(R) oneAPI DPC++ Runtime and set the environment.

 

The system is ready to be used for performance analysis with Intel VTune Profiler.

Review warnings in the output above to find product limitations, if any.

 

The system is ready for the following analyses:

* Performance Snapshot

* Hotspots and Threading with user-mode sampling

* Hotspots with HW event-based sampling, HPC Performance Characterization, etc.

* Microarchitecture Exploration

* Memory Access

* Hotspots with HW event-based sampling and call stacks

* Threading with HW event-based sampling

 

The following analyses have failed on the system:

* GPU Compute/Media Hotspots (characterization mode)

* GPU Compute/Media Hotspots (source analysis mode)

 

Log location: C:\Users\optane\AppData\Local\Temp\vtune-tmp-optane\self-checker-2022.10.17_09.15.15\log.txt

 

C:\Program Files (x86)\Intel\oneAPI\vtune\2022.4.0\bin64>

Dom324
Beginner
1,328 Views

Not sure why my comment got deleted, but I've got exactly the same issue on i5 12400 and RHEL8.

Dom324_0-1665876874572.png

Although when I look at Event Count tab, I can see that most counters look like they are working correctly, but for example "Clockticks" column is always zero:

Dom324_1-1665876966242.png

I'm including a CSV file with exported counter values as attachment.
This is the output of self checker:

[kevin@localhost bin64]$ sudo ./vtune-self-checker.sh
[sudo] password for kevin: 
Intel(R) VTune(TM) Profiler Self Check Utility
Copyright (C) 2009 Intel Corporation. All rights reserved.
Build Number: 624050

HW event-based analysis (counting mode) (Intel driver)   
Example of analysis types: Performance Snapshot
    Collection: Ok
vtune: Warning: Cannot collect GPU hardware metrics because libmd.so was not found. Make sure you have installed Metrics Discovery API library from https://github.com/intel/metrics-discovery. See Error Message: Cannot Collect GPU Hardware Metrics help topic for more details.
    Finalization: Ok...
vtune: Warning: Cannot collect GPU hardware metrics because libmd.so was not found. Make sure you have installed Metrics Discovery API library from https://github.com/intel/metrics-discovery. See Error Message: Cannot Collect GPU Hardware Metrics help topic for more details.

    Report: Ok

Instrumentation based analysis check   
Example of analysis types: Hotspots and Threading with user-mode sampling
    Collection: Ok
    Finalization: Ok...
    Report: Ok

HW event-based analysis check (Intel driver)   
Example of analysis types: Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis check (Intel driver)   
Example of analysis types: Microarchitecture Exploration
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis with uncore events (Intel driver)   
Example of analysis types: Memory Access
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis with stacks (Intel driver)   
Example of analysis types: Hotspots with HW event-based sampling and call stacks
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis with context switches (Intel driver)   
Example of analysis types: Threading with HW event-based sampling
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

Checking DPC++ application as prerequisite for GPU analyses: Fail
Unable to run DPC++ application on GPU connected to this system. If you are using an Intel GPU and want to verify profiling support for DPC++ applications, check these requirements:
* Install Intel(R) GPU driver.
* Install Intel(R) Level Zero GPU runtime.
* Install Intel(R) oneAPI DPC++ Runtime and set the environment.

The system is ready to be used for performance analysis with Intel VTune Profiler.
Review warnings in the output above to find product limitations, if any.

The system is ready for the following analyses:
* Performance Snapshot
* Hotspots and Threading with user-mode sampling
* Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
* Microarchitecture Exploration
* Memory Access
* Hotspots with HW event-based sampling and call stacks
* Threading with HW event-based sampling

The following analyses have failed on the system:
* GPU Compute/Media Hotspots (characterization mode)
* GPU Compute/Media Hotspots (source analysis mode)

Log location: /tmp/vtune-tmp-root/self-checker-2022.10.15_14.27.41/log.txt


And this is the output when I try the default matrix demo on CLI:

[kevin@localhost bin64]$ ./vtune -collect uarch-exploration -r '/home/kevin/Documents/vtune_test'  /home/kevin/intel/vtune/samples/matrix/matrix
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /home/kevin/Documents/vtune_test -command stop.
Addr of buf1 = 0x7fe3f4f61010
Offs of buf1 = 0x7fe3f4f61180
Addr of buf2 = 0x7fe3f2f60010
Offs of buf2 = 0x7fe3f2f601c0
Addr of buf3 = 0x7fe3f0f5f010
Offs of buf3 = 0x7fe3f0f5f100
Addr of buf4 = 0x7fe3eef5e010
Offs of buf4 = 0x7fe3eef5e140
Threads #: 8 Pthreads
Matrix size: 2048
Using multiply kernel: multiply1
Execution time = 15.122 seconds
vtune: Collection stopped.
vtune: Using result path `/home/kevin/Documents/vtune_test'
vtune: Executing actions 19 % Resolving information for `ld-2.28.so'           
vtune: Warning: Cannot locate file `nft_counter.ko'.
vtune: Executing actions 19 % Resolving information for `nft_counter'          
vtune: Warning: Cannot locate file `bridge.ko'.
vtune: Executing actions 19 % Resolving information for `matrix'               
vtune: Warning: Cannot locate file `r8169.ko'.
vtune: Executing actions 19 % Resolving information for `r8169'                
vtune: Warning: Cannot locate file `nf_tables.ko'.
vtune: Executing actions 19 % Resolving information for `nf_tables'            
vtune: Warning: Cannot locate file `nf_conntrack.ko'.
vtune: Executing actions 21 % Resolving information for `libc-2.28.so'         
vtune: Warning: Cannot locate file `nft_ct.ko'.
vtune: Executing actions 21 % Resolving information for `nft_ct'               
vtune: Warning: Cannot locate debugging information for file `/usr/lib64/libc-2.28.so'.
vtune: Warning: Cannot locate file `drm.ko'.
vtune: Executing actions 21 % Resolving information for `drm'                  
vtune: Warning: Cannot locate file `kvm.ko'.
vtune: Executing actions 22 % Resolving information for `kvm'                  
vtune: Warning: Cannot locate file `xfs.ko'.
vtune: Executing actions 22 % Resolving information for `xfs'                  
vtune: Warning: Cannot locate file `i915.ko'.
vtune: Executing actions 22 % Resolving information for `i915'                 
vtune: Warning: Cannot locate file `sep5.ko'.
vtune: Executing actions 22 % Resolving information for `vmlinux'              
vtune: Warning: Function and source-level analysis for the Linux kernel will not be possible since neither debug version of the kernel nor kernel symbol tables are found. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.
vtune: Executing actions 75 % Generating a report                              Elapsed Time: 15.135s
    Clockticks: 0
        P-Core: 0
        E-Core: 0
    Instructions Retired: 69,798,000,000
        P-Core: 0
        E-Core: 0
    CPI Rate: 0.000
        P-Core: 0.000
        E-Core: 0.000
    MUX Reliability: 0.000
     | Precision of collected HW event data is not enough. Metrics data may be
     | unreliable. Consider increasing your application execution time, using
     | the multiple runs mode instead of event multiplexing, or creating a
     | custom analysis with a limited subset of HW events. If you are using a
     | driverless collection, consider reducing the value of
     | /sys/bus/event_source/devices/cpu/perf_event_mux_interval_ms file.
     |
    P-Core
        Retiring: 0.0% of Pipeline Slots
            Light Operations: 0.0% of Pipeline Slots
                FP Arithmetic: 0.0% of uOps
                    FP x87: 0.0% of uOps
                    FP Scalar: 0.0% of uOps
                    FP Vector: 0.0% of uOps
                Memory Operations: 0.0% of Pipeline Slots
                Fused Instructions: 0.0% of Pipeline Slots
                Non Fused Branches: 0.0% of Pipeline Slots
                Nop Instructions: 0.0% of Pipeline Slots
                Other: 0.0% of Pipeline Slots
            Heavy Operations: 0.0% of Pipeline Slots
                Microcode Sequencer: 0.0% of Pipeline Slots
                    Assists: 0.0% of Pipeline Slots
                    CISC: 0.0% of Pipeline Slots
        Front-End Bound: 0.0% of Pipeline Slots
            Front-End Latency: 0.0% of Pipeline Slots
                ICache Misses: 0.0% of Clockticks
                ITLB Overhead: 0.0% of Clockticks
                Branch Resteers: 0.0% of Clockticks
                    Mispredicts Resteers: 0.0% of Clockticks
                    Clears Resteers: 0.0% of Clockticks
                    Unknown Branches: 0.0% of Clockticks
                DSB Switches: 0.0% of Clockticks
                Length Changing Prefixes: 0.0% of Clockticks
                MS Switches: 0.0% of Clockticks
            Front-End Bandwidth: 0.0% of Pipeline Slots
                Front-End Bandwidth MITE: 0.0% of Pipeline Slots
                    Decoder-0 Alone: 0.0% of Clockticks
                Front-End Bandwidth DSB: 0.0% of Pipeline Slots
                Front-End Bandwidth LSD: 0.0% of Pipeline Slots
                (Info) DSB Coverage: 0.0%
                (Info) LSD Coverage: 0.0%
                (Info) DSB Misses Cost: 0.0% of Pipeline Slots
        Bad Speculation: 100.0% of Pipeline Slots
         | A significant proportion of pipeline slots containing useful work are
         | being cancelled. This can be caused by mispredicting branches or by
         | machine clears. Note that this metric value may be highlighted due to
         | Branch Resteers issue.
         |
            Branch Mispredict: 0.0% of Pipeline Slots
            Machine Clears: 100.0% of Pipeline Slots
             | Issue: A significant portion of execution time is spent handling
             | machine clears.
             | 
             | Tips: See the "Memory Disambiguation" section in the Intel 64 and
             | IA-32 Architectures Optimization Reference Manual.
             |
        Back-End Bound: 0.0% of Pipeline Slots
            Memory Bound: 0.0% of Pipeline Slots
                L1 Bound: 0.0% of Clockticks
                    DTLB Overhead: 0.0% of Clockticks
                        Load STLB Hit: 0.0% of Clockticks
                        Load STLB Miss: 0.0% of Clockticks
                    Loads Blocked by Store Forwarding: 0.0% of Clockticks
                    Lock Latency: 0.0% of Clockticks
                    Split Loads: 0.0% of Clockticks
                    FB Full: 0.0% of Clockticks
                L2 Bound: 0.0% of Clockticks
                L3 Bound: 0.0% of Clockticks
                    Contested Accesses: 0.0% of Clockticks
                    Data Sharing: 0.0% of Clockticks
                    L3 Latency: 0.0% of Clockticks
                    SQ Full: 0.0% of Clockticks
                DRAM Bound: 0.0% of Clockticks
                    Memory Bandwidth: 0.0% of Clockticks
                    Memory Latency: 0.0% of Clockticks
                Store Bound: 0.0% of Clockticks
                    Store Latency: 0.0% of Clockticks
                    False Sharing: 0.0% of Clockticks
                    Split Stores: 0.0% of Clockticks
                    Streaming Stores: 0.0% of Clockticks
                    DTLB Store Overhead: 0.0% of Clockticks
                        Store STLB Hit: 0.0% of Clockticks
                        Store STLB Hit: 0.0% of Clockticks
            Core Bound: 0.0% of Pipeline Slots
                Divider: 0.0% of Clockticks
                Port Utilization: 0.0% of Clockticks
                    Cycles of 0 Ports Utilized: 0.0% of Clockticks
                        Serializing Operations: 0.0% of Clockticks
                            Slow Pause: 0.0% of Clockticks
                            Memory Fence: 0.0% of Clockticks
                        Mixing Vectors: 0.0% of Clockticks
                    Cycles of 1 Port Utilized: 0.0% of Clockticks
                    Cycles of 2 Ports Utilized: 0.0% of Clockticks
                    Cycles of 3+ Ports Utilized: 0.0% of Clockticks
                        ALU Operation Utilization: 0.0% of Clockticks
                            Port 0: 0.0% of Clockticks
                            Port 1: 0.0% of Clockticks
                            Port 6: 0.0% of Clockticks
                        Load Operation Utilization: 0.0% of Clockticks
                        Store Operation Utilization: 0.0% of Clockticks
    E-Core
        Retiring: 0.0% of Pipeline Slots
            General Retirement: 0.0% of Pipeline Slots
                FP Arithmetic: 0.0% of Pipeline Slots
                Other: 0.0% of Pipeline Slots
            Microcode Sequencer: 0.0% of Pipeline Slots
        Front-End Bound: 0.0% of Pipeline Slots
            Front-End Latency: 0.0% of Pipeline Slots
                ICache Misses: 0.0% of Pipeline Slots
                ITLB Overhead: 0.0% of Pipeline Slots
                BACLEARS: 0.0% of Pipeline Slots
                Branch Resteers: 0.0% of Pipeline Slots
            Front-End Bandwidth: 0.0% of Pipeline Slots
                Cisc: 0.0% of Pipeline Slots
                Decode: 0.0% of Pipeline Slots
                Pre-Decode Wrong: 0.0% of Pipeline Slots
                Front-End Other: 0.0% of Pipeline Slots
        Bad Speculation: 0.0% of Pipeline Slots
            Branch Mispredict: 0.0% of Pipeline Slots
            Machine Clears: 0.0% of Pipeline Slots
                Machine Clear: 0.0% of Pipeline Slots
                MO Machine Clear Overhead: 0.0% of Pipeline Slots
        Back-End Bound: 0.0% of Pipeline Slots
            Resource Bound: 0.0% of Pipeline Slots
                Memory Scheduler: 0.0% of Pipeline Slots
                Non-memory Scheduler: 0.0% of Pipeline Slots
                Register: 0.0% of Pipeline Slots
                Full Re-order Buffer (ROB): 0.0% of Pipeline Slots
                Allocation Restriction: 0.0% of Pipeline Slots
                Serializing Operations: 0.0% of Pipeline Slots
        Alternative Back-End Bound: 0.0% of Pipeline Slots
            Core Bound: 0.0%
            Memory Bound: 0.0%
                L2 Bound: 0.0%
                L3 Bound: 0.0%
                DRAM Bound: 0.0%
    Average CPU Frequency: 0.000 MHz
    Total Thread Count: 9
    Paused Time: 0s
Effective Physical Core Utilization: 0.0% (0.000 out of 6)
 | The metric value is low, which may signal a poor physical CPU cores
 | utilization caused by:
 |     - load imbalance
 |     - threading runtime overhead
 |     - contended synchronization
 |     - thread/process underutilization
 |     - incorrect affinity that utilizes logical cores instead of physical
 |       cores
 | Explore sub-metrics to estimate the efficiency of MPI and OpenMP parallelism
 | or run the Locks and Waits analysis to identify parallel bottlenecks for
 | other parallel runtimes.
 |
    Effective Logical Core Utilization: 63.8% (7.653 out of 12)
     | The metric value is low, which may signal a poor logical CPU cores
     | utilization. Consider improving physical core utilization as the first
     | step and then look at opportunities to utilize logical cores, which in
     | some cases can improve processor throughput and overall performance of
     | multi-threaded applications.
     |
Collection and Platform Info
    Application Command Line: /home/kevin/intel/vtune/samples/matrix/matrix 
    User Name: kevin
    Operating System: 4.18.0-372.19.1.el8_6.x86_64 Red Hat Enterprise Linux release 8.6 (Ootpa)
    Computer Name: localhost.localdomain
    Result Size: 305.1 MB 
    Collection start time: 12:13:29 15/10/2022 UTC
    Collection stop time: 12:13:44 15/10/2022 UTC
    Collector Type: Event-based sampling driver
    CPU
        Name: Intel(R) microarchitecture code named Alderlake-S
        Frequency: 2.995 GHz
        Logical CPU Count: 12
        Cache Allocation Technology
            Level 2 capability: available
            Level 3 capability: not detected

If you want to skip descriptions of detected performance issues in the report,
enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
Alternatively, you may view the report in the csv format: vtune -report
<report_name> -format=csv.
vtune: Executing actions 100 % done 

 

SarasLogo
Beginner
1,369 Views

I am experiencing the same issue on an AlderLake system.

Below are the results of running self-check:

Intel(R) VTune(TM) Profiler Self Check Utility
Copyright (C) 2009 Intel Corporation. All rights reserved.
Build Number: 624343

HW event-based analysis (counting mode) (Intel driver)
Example of analysis types: Performance Snapshot
Collection: Ok
Finalization: Ok...
Report: Ok

Instrumentation based analysis check
Example of analysis types: Hotspots and Threading with user-mode sampling
Collection: Ok
Finalization: Ok...
Report: Ok

HW event-based analysis check (Intel driver)
Example of analysis types: Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
Collection: Ok
Finalization: Ok...
Report: Ok

HW event-based analysis check (Intel driver)
Example of analysis types: Microarchitecture Exploration
Collection: Ok
Finalization: Ok...
Report: Ok

HW event-based analysis with uncore events (Intel driver)
Example of analysis types: Memory Access
Collection: Ok
Finalization: Ok...
Report: Ok

HW event-based analysis with stacks (Intel driver)
Example of analysis types: Hotspots with HW event-based sampling and call stacks
Collection: Ok
vtune: Warning: Stack flow analysis on this platform is limited to the hardware LBR-based stack type that has a depth limitation.
Finalization: Ok...
Report: Ok

HW event-based analysis with context switches (Intel driver)
Example of analysis types: Threading with HW event-based sampling
Collection: Ok
Finalization: Ok...
Report: Ok

Checking DPC++ application as prerequisite for GPU analyses: Fail
Unable to run DPC++ application on GPU connected to this system. If you are using an Intel GPU and want to verify profiling support for DPC++ applications, check these requirements:
* Install Intel(R) GPU driver.
* Install Intel(R) Level Zero GPU runtime.
* Install Intel(R) oneAPI DPC++ Runtime and set the environment.

The system is ready to be used for performance analysis with Intel VTune Profiler.
Review warnings in the output above to find product limitations, if any.

The system is ready for the following analyses:
* Performance Snapshot
* Hotspots and Threading with user-mode sampling
* Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
* Microarchitecture Exploration
* Memory Access
* Hotspots with HW event-based sampling and call stacks
* Threading with HW event-based sampling

The following analyses have failed on the system:
* GPU Compute/Media Hotspots (characterization mode)
* GPU Compute/Media Hotspots (source analysis mode)

JaideepK_Intel
Moderator
1,297 Views

Hi All,

 

Good day to you all.

Can you please follow the below steps?

1. Please let us know whether you were able to reproduce the same issue with the latest version of Vtune, i.e., 2022. 4

2. Can you please share your result directory where you were getting that error as a zip file? (The result directory looks like: r000XX.)

 

Thanks,

Jaideep

 

Maximillia_D_Intel
1,214 Views

Thanks. Yes I.m using Vtune 2022.4.0 build 624343.

Please find the result directory below.

Please let me know next steps.

Max

oliver_h
Beginner
1,209 Views

Hi,

Please find attached results

I can confirm i am already using 2022.4 and i'm having the same issues.

Oliver

JaideepK_Intel
Moderator
1,095 Views

Hi All,


Thank you for sharing the log files, we are working on this internally and get back to you with an update.


Regards,

Jaideep


JaideepK_Intel
Moderator
1,060 Views

Hi All,

 

The issue is known issue on VTune product when the system runs on Hyper-V because Hyper-V blocks non-converged PMU feature on hetero platforms. 

Can you please disable Hyper-V in Windows features and let us know if the issue still persists. Please attach the results after disabling Hyper-V.

JaideepK_Intel_0-1667800601047.png

 

Thanks,

Jaideep

 

oliver_h
Beginner
1,047 Views

Hi Jaideep,

 

I already thought that could be an issue and checked it wasn't on and it isn't, I also turned off virtulisation and it also didnt make a difference.

 

Thanks again,

Oliver

 

oliver_h_0-1667851004755.png

 

oliver_h
Beginner
1,012 Views

Hi Jaideep,

 

Looking at what you said I had to go one step further, I turned Intel Virtulisation (VMX) off in the BIOS and that has made me able to see P-core performance in Microarchitecture exploration now. (resolved the issue).

Maybe theres some virtulisation option that needs to be reset after setting hyper-v.

For any one else following this post I would suggest disabling virtulisation off in the BIOS and see if that works for you.

 

Regards,

Oliver

oliver_h_0-1667936016421.png

 

Dom324
Beginner
967 Views

Hi Jaideep,
I have tried to follow the advice and I have disabled virtualization in BIOS (as can be seen in the screenshot below), but I still have the same issue, I'm attaching results of the microarchitecture analysis.

I'm not running Windows, I'm running Linux (RHEL8). Is there something I need to change in my OS settings?

Both "Intel Virtualization Tech" and "Intel VT-D Tech" is disabled:

MSI_SnapShot.jpg

Maximillia_D_Intel
1,039 Views

This does not fix the issue for me. I have Hyper-V off and still see the same issue. Please see attached.

Next steps?

Thanks,

Max

 

 

Maximillia_D_Intel
1,004 Views

If I turn off virtualization in the BIOS as Oliver did, I am able to see P-Core counts.

 

That is decent news. I can proceed with this limitation.

 

Will this issue be addressed? I would prefer to have Virtualization turned on by default and not have to turn it off everytime I want to use VTune. 

 

Thanks,

 

Max

JaideepK_Intel
Moderator
942 Views

Hi All,

 

Thank you for sharing your observation and the log files. We are working on this internally and will get back to you with an update.

 

Regards,

Jaideep


JaideepK_Intel
Moderator
911 Views

Hi All,

 

Turning off virtualization in the BIOS settings can show PCORE values (This issue can be seen on some alder lake platforms). If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

 

Regards,

Jaideep

 

Dom324
Beginner
896 Views

Hello Jaideep, as I have previously stated, this did NOT solve my issue. As can be seen in attached screenshot, virtualization in BIOS is disabled, but still microarchitecture analysis does not show values for P cores.
OS: RHEL8
Is there anything I can try next?

MSI_SnapShot.jpg

Dom324_0-1668856340029.png

JaideepK_Intel
Moderator
843 Views

Hi,


Could you please attach the screen shots of your system info output.


Thanks,

Jaideep


Reply