Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

vtune error

kakuyojiki
Beginner
3,142 Views

Pin: pin-3.22-98547-7a303a835
Copyright 2002-2020 Intel Corporation.
E: [tid:13968] SYSCALL_INSPECTOR: NTDLL module is rebased
A: C:\tmp_proj\pinjen\workspace\pypl-pin-nightly\GitPin\Source\pin\base_w\ipc_server_windows.cpp: LEVEL_BASE::StartServer: 1283: assertion failed: res == TRUE

 

i5 12450h

windows 11 insider preview

 

and

microarchitecture exploration shows weird result

in task manager it use P-core, 

but vtune P-core summary all zero

 

vtune: Collection stopped.
vtune: Using result path `C:\Windows\System32\r001ue'
vtune: Executing actions 20 % Resolving information for `storqosflt.sys'
vtune: Warning: Finalization may slow down when loading files from the symbol server specified with the _NT_SYMBOL_PATH environment variable.
vtune: Executing actions 21 % Resolving information for `wdfilter.sys'
vtune: Warning: Cannot locate debugging information for file `c:\windows\system32\drivers\netwtw10.sys'.
vtune: Executing actions 22 % Resolving information for `wdfilter.sys'
vtune: Warning: Cannot locate debugging information for file `c:\windows\system32\drivers\wd\wdfilter.sys'.
vtune: Executing actions 75 % Generating a report Elapsed Time: 205.423s
Clockticks: 179,187,840,000
P-Core: 0
E-Core: 179,187,840,000
Instructions Retired: 421,282,368,000
P-Core: 0
E-Core: 421,282,368,000
CPI Rate: 0.425
P-Core: 0.000
E-Core: 0.425
MUX Reliability: 0.998
P-Core
Retiring: 0.0% of Pipeline Slots
Light Operations: 0.0% of Pipeline Slots
FP Arithmetic: 0.0% of uOps
FP x87: 0.0% of uOps
FP Scalar: 0.0% of uOps
FP Vector: 0.0% of uOps
Memory Operations: 0.0% of Pipeline Slots
Fused Instructions: 0.0% of Pipeline Slots
Non Fused Branches: 0.0% of Pipeline Slots
Nop Instructions: 0.0% of Pipeline Slots
Other: 0.0% of Pipeline Slots
Heavy Operations: 0.0% of Pipeline Slots
Microcode Sequencer: 0.0% of Pipeline Slots
Assists: 0.0% of Pipeline Slots
CISC: 0.0% of Pipeline Slots
Front-End Bound: 0.0% of Pipeline Slots
Front-End Latency: 0.0% of Pipeline Slots
ICache Misses: 0.0% of Clockticks
ITLB Overhead: 0.0% of Clockticks
Branch Resteers: 0.0% of Clockticks
Mispredicts Resteers: 0.0% of Clockticks
Clears Resteers: 0.0% of Clockticks
Unknown Branches: 0.0% of Clockticks
DSB Switches: 0.0% of Clockticks
Length Changing Prefixes: 0.0% of Clockticks
MS Switches: 0.0% of Clockticks
Front-End Bandwidth: 0.0% of Pipeline Slots
Front-End Bandwidth MITE: 0.0% of Pipeline Slots
Decoder-0 Alone: 0.0% of Clockticks
Front-End Bandwidth DSB: 0.0% of Pipeline Slots
Front-End Bandwidth LSD: 0.0% of Pipeline Slots
(Info) DSB Coverage: 0.0%
(Info) LSD Coverage: 0.0%
(Info) DSB Misses Cost: 0.0% of Pipeline Slots
Bad Speculation: 100.0% of Pipeline Slots
| A significant proportion of pipeline slots containing useful work are
| being cancelled. This can be caused by mispredicting branches or by
| machine clears. Note that this metric value may be highlighted due to
| Branch Resteers issue.
|
Branch Mispredict: 0.0% of Pipeline Slots
Machine Clears: 100.0% of Pipeline Slots
| Issue: A significant portion of execution time is spent handling
| machine clears.
|
| Tips: See the "Memory Disambiguation" section in the Intel 64 and
| IA-32 Architectures Optimization Reference Manual.
|
Back-End Bound: 0.0% of Pipeline Slots
Memory Bound: 0.0% of Pipeline Slots
L1 Bound: 0.0% of Clockticks
DTLB Overhead: 0.0% of Clockticks
Load STLB Hit: 0.0% of Clockticks
Load STLB Miss: 0.0% of Clockticks
Loads Blocked by Store Forwarding: 0.0% of Clockticks
Lock Latency: 0.0% of Clockticks
Split Loads: 0.0% of Clockticks
FB Full: 0.0% of Clockticks
L2 Bound: 0.0% of Clockticks
L3 Bound: 0.0% of Clockticks
L3 Latency
SQ Full: 0.0% of Clockticks
DRAM Bound: 0.0% of Clockticks
Memory Bandwidth: 0.0% of Clockticks
Memory Latency: 0.0% of Clockticks
Store Bound: 0.0% of Clockticks
Store Latency: 0.0% of Clockticks
Split Stores: 0.0% of Clockticks
DTLB Store Overhead: 0.0% of Clockticks
Store STLB Hit: 0.0% of Clockticks
Store STLB Hit: 0.0% of Clockticks
Core Bound: 0.0% of Pipeline Slots
Divider: 0.0% of Clockticks
Port Utilization: 0.0% of Clockticks
Cycles of 0 Ports Utilized: 0.0% of Clockticks
Serializing Operations: 0.0% of Clockticks
Slow Pause: 0.0% of Clockticks
Memory Fence: 0.0% of Clockticks
Mixing Vectors: 0.0% of Clockticks
Cycles of 1 Port Utilized: 0.0% of Clockticks
Cycles of 2 Ports Utilized: 0.0% of Clockticks
Cycles of 3+ Ports Utilized: 0.0% of Clockticks
ALU Operation Utilization: 0.0% of Clockticks
Port 0: 0.0% of Clockticks
Port 1: 0.0% of Clockticks
Port 6: 0.0% of Clockticks
Load Operation Utilization: 0.0% of Clockticks
Store Operation Utilization: 0.0% of Clockticks
E-Core
Retiring: 53.0% of Pipeline Slots
General Retirement: 51.4% of Pipeline Slots
FP Arithmetic: 0.1% of Pipeline Slots
Other: 51.3% of Pipeline Slots
Microcode Sequencer: 1.7% of Pipeline Slots
Front-End Bound: 14.1% of Pipeline Slots
Front-End Latency: 100.0% of Pipeline Slots
ICache Misses: 0.6% of Pipeline Slots
ITLB Overhead: 0.1% of Pipeline Slots
BACLEARS: 0.4% of Pipeline Slots
Branch Resteers: 100.0% of Pipeline Slots
Front-End Bandwidth: 56.0% of Pipeline Slots
Cisc: 2.6% of Pipeline Slots
Decode: 8.4% of Pipeline Slots
Pre-Decode Wrong: 0.1% of Pipeline Slots
Front-End Other: 44.9% of Pipeline Slots
Bad Speculation: 73.9% of Pipeline Slots
| A significant proportion of pipeline slots containing useful work are
| being cancelled. This can be caused by mispredicting branches or by
| machine clears. Note that this metric value may be highlighted due to
| Branch Resteers issue.
|
Branch Mispredict: 4.7% of Pipeline Slots
Machine Clears: 0.6% of Pipeline Slots
Machine Clear: 0.0% of Pipeline Slots
MO Machine Clear Overhead: 0.6% of Pipeline Slots
Back-End Bound: 26.6% of Pipeline Slots
| A significant portion of pipeline slots are remaining empty. When
| operations take too long in the back-end, they introduce bubbles in
| the pipeline that ultimately cause fewer pipeline slots containing
| useful work to be retired per cycle than the machine is capable to
| support. This opportunity cost results in slower execution. Long-
| latency operations like divides and memory operations can cause this,
| as can too many operations being directed to a single execution port
| (for example, more multiply operations arriving in the back-end per
| cycle than the execution unit can support).
|
Resource Bound: 26.6% of Pipeline Slots
| Resource Bound
|
Memory Scheduler: 2.9% of Pipeline Slots
Non-memory Scheduler: 41.9% of Pipeline Slots
| A significant percentage of issue slots were not consumed by
| the backend due to IEC and FPC RAT stalls. This can be caused
| by the FIQ and IEC reservation station stall (integer, FP and
| SIMD scheduler not able to accept another uop).
|
Register: 1.5% of Pipeline Slots
Full Re-order Buffer (ROB): 12.7% of Pipeline Slots
| A significant percentage of issue slots were not consumed by
| the backend due to ROB full.
|
Allocation Restriction: 1.6% of Pipeline Slots
Serializing Operations: 2.4% of Pipeline Slots
Alternative Back-End Bound: 26.6% of Pipeline Slots
| A significant portion of pipeline slots are remaining empty. When
| operations take too long in the back-end, they introduce bubbles in
| the pipeline that ultimately cause fewer pipeline slots containing
| useful work to be retired per cycle than the machine is capable to
| support. This opportunity cost results in slower execution. Long-
| latency operations like divides and memory operations can cause this,
| as can too many operations being directed to a single execution port
| (for example, more multiply operations arriving in the back-end per
| cycle than the execution unit can support).
|
Core Bound: 0.0%
Memory Bound: 42.5%
| The metric value is high. This can indicate that the significant
| fraction of execution pipeline slots could be stalled due to
| demand memory load and stores. Use Memory Access analysis to have
| the metric breakdown by memory hierarchy, memory bandwidth
| information, correlation by memory objects.
|
L2 Bound: 2.3%
L3 Bound: 1.6%
DRAM Bound: 38.6%
| This metric shows how often CPU was stalled on the main
| memory (DRAM). Caching typically improves the latency and
| increases performance.
|
Average CPU Frequency: 3.249 GHz
Total Thread Count: 13
Paused Time: 0s
Effective Physical Core Utilization: 3.3% (0.266 out of
| The metric value is low, which may signal a poor physical CPU cores
| utilization caused by:
| - load imbalance
| - threading runtime overhead
| - contended synchronization
| - thread/process underutilization
| - incorrect affinity that utilizes logical cores instead of physical
| cores
| Explore sub-metrics to estimate the efficiency of MPI and OpenMP parallelism
| or run the Locks and Waits analysis to identify parallel bottlenecks for
| other parallel runtimes.
|
Effective Logical Core Utilization: 2.2% (0.266 out of 12)
| The metric value is low, which may signal a poor logical CPU cores
| utilization. Consider improving physical core utilization as the first
| step and then look at opportunities to utilize logical cores, which in
| some cases can improve processor throughput and overall performance of
| multi-threaded applications.
|
Collection and Platform Info
Application Command Line: C:\Github\vvenc\bin\relwithdebinfo-static\vvencapp.exe "--preset" "slower" "-i" "akiyo_cif.y4m" "-o" "2.266" "-t" "12"
Operating System: Microsoft Windows 10
Computer Name: 
Result Size: 346.2 MB
Collection start time: 03:24:11 23/09/2022 UTC
Collection stop time: 03:27:36 23/09/2022 UTC
Collector Type: Event-based sampling driver
CPU
Name: Intel(R) microarchitecture code named Alderlake-P
Frequency: 2.496 GHz
Logical CPU Count: 12
Max DRAM Single-Package Bandwidth: 31.000 GB/s
Cache Allocation Technology
Level 2 capability: not detected
Level 3 capability: not detected

If you want to skip descriptions of detected performance issues in the report,
enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
Alternatively, you may view the report in the csv format: vtune -report
<report_name> -format=csv.
vtune: Executing actions 100 % done

C:\Windows\System32>cd C:\Program Files (x86)\Intel\oneAPI\vtune\2022.3.0\bin64

C:\Program Files (x86)\Intel\oneAPI\vtune\2022.3.0\bin64>amplxe-sepreg.exe

Usage: amplxe-sepreg.exe [ option ]

where option is one of the following:

-c | --check-dependencies
-i | --install-driver
-s | --status
-u [pax]| --uninstall-driver [pax]

-v | --verbose may also be added to the above option for additional output


C:\Program Files (x86)\Intel\oneAPI\vtune\2022.3.0\bin64>amplxe-sepreg.exe -c
Checking platform...
Platform is genuine Intel: OK
Platform has SSE2: OK
Platform architecture: INTEL64
User has admin rights: OK
Drivers will be installed to C:\WINDOWS\System32\Drivers\
Checking sepdrv5 driver path...OK
Checking sepdrv5 service...
Driver status: the sepdrv5 service is running
Checking sepdal driver path...OK
Checking sepdal service...
Driver status: the sepdal service is running
Checking socperf3 driver path...OK
Checking socperf3 service...
Driver status: the socperf3 service is running

C:\Program Files (x86)\Intel\oneAPI\vtune\2022.3.0\bin64>amplxe-sepreg.exe -s
Checking sepdrv5 driver path...OK
Checking sepdrv5 service...
Driver status: the sepdrv5 service is running
Checking sepdal driver path...OK
Checking sepdal service...
Driver status: the sepdal service is running
Checking socperf3 driver path...OK
Checking socperf3 service...
Driver status: the socperf3 service is running

C:\Program Files (x86)\Intel\oneAPI\vtune\2022.3.0\bin64>amplxe-sepreg.exe -i
Warning, socperf3 driver is already installed and will be re-used... skipping
Installing and starting sepdrv5...
OK
Installing and starting sepdal...
OK
Installing and starting VTSS++ driver...FAILED

C:\Program Files (x86)\Intel\oneAPI\vtune\2022.3.0\bin64>vtune-self-checker.bat
Intel(R) VTune(TM) Profiler Self Check Utility
Copyright (C) 2009 Intel Corporation. All rights reserved.
Build Number: 624050

HW event-based analysis (counting mode) (Intel driver)
Example of analysis types: Performance Snapshot
Collection: Ok
Finalization: Ok...
Report: Ok

Labels (1)
0 Kudos
7 Replies
ShyamS_Intel
Moderator
3,088 Views

Hi,


Thank you for posting in Intel Communities.


Please attach complete self checker logs.


Thanks

Shyam Sundar


0 Kudos
kakuyojiki
Beginner
3,079 Views

its stuck

Microsoft Windows [Version 10.0.25206.1000]
(c) Microsoft Corporation. All rights reserved.

C:\Windows\System32>cd "C:\Program Files (x86)\Intel\oneAPI\vtune\2022.3.0\bin64"

C:\Program Files (x86)\Intel\oneAPI\vtune\2022.3.0\bin64>vtune-self-checker.bat
Intel(R) VTune(TM) Profiler Self Check Utility
Copyright (C) 2009 Intel Corporation. All rights reserved.
Build Number: 624050

HW event-based analysis (counting mode) (Intel driver)
Example of analysis types: Performance Snapshot
Collection: Ok
Finalization: Ok...
Report: Ok

Instrumentation based analysis check...

0 Kudos
ShyamS_Intel
Moderator
3,064 Views

Hi,


Could you please provide sample reproducer and commands which you tried, so that we can try it from our end


Thanks

Shyam Sundar


0 Kudos
ShyamS_Intel
Moderator
3,012 Views

Hi,


Could you please provide us an update.


Thanks

Shyam Sundar


0 Kudos
kakuyojiki
Beginner
3,001 Views
0 Kudos
ShyamS_Intel
Moderator
2,992 Views

Hi, 

Sorry for the inconvenience caused. To assist you better, can you try running the VTune matrix multiplication sample as per the below steps in CLI:

vtune -collect <-action> <target>

Please refer documentation if needed: https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/command-line-interface.html

Since, you mentioned that you are not using Intel VTune Profiler now. Do let us know if we can stop monitoring this thread.

 

Thanks,

Shyam Sundar

0 Kudos
ShyamS_Intel
Moderator
2,910 Views

Hi,


I assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel. 


Thanks

Shyam Sundar


0 Kudos
Reply