Analyzers
Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
4554 Discussions

VTune Profiler BackEnd-Bound no Detail

armorhu
Beginner
615 Views

hi, i have some problem when using vtune profiler in linux .

i  run the microarchitecture exploration type snapshot,but i the data of backend-bound。there is no detail info blew it。

this is i suppose to get (run on my pc)

armorhu_0-1610600632944.png

but this is what i get(run on the linux server)

armorhu_1-1610600721688.png

here is my vtune cmdline:

vtune -collect-with runsa -knob enable-stack-collection=true -knob stack-size=0 -knob enable-user-tasks=true -knob dram-bandwidth-limits=true -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=3000000,CPU_CLK_UNHALTED.REF_TSC:sample:sa=3000000,INST_RETIRED.ANY:sample:sa=3000000,CPU_CLK_UNHALTED.REF_XCLK:sa=100003,CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:sa=100003,CPU_CLK_UNHALTED.THREAD_P,CYCLE_ACTIVITY.STALLS_MEM_ANY,EXE_ACTIVITY.1_PORTS_UTIL,EXE_ACTIVITY.2_PORTS_UTIL,EXE_ACTIVITY.BOUND_ON_STORES,EXE_ACTIVITY.EXE_BOUND_0_PORTS,IDQ_UOPS_NOT_DELIVERED.CORE,INT_MISC.RECOVERY_CYCLES,UOPS_ISSUED.ANY,UOPS_RETIRED.RETIRE_SLOTS,BACLEARS.ANY,BR_MISP_RETIRED.ALL_BRANCHES_PS,DSB2MITE_SWITCHES.PENALTY_CYCLES,ICACHE_16B.IFDATA_STALL,ICACHE_16B.IFDATA_STALL:cmask=1:e=yes,ICACHE_64B.IFTAG_STALL,IDQ.ALL_DSB_CYCLES_4_UOPS,IDQ.ALL_DSB_CYCLES_ANY_UOPS,IDQ.ALL_MITE_CYCLES_4_UOPS,IDQ.ALL_MITE_CYCLES_ANY_UOPS,IDQ.DSB_UOPS,IDQ.MITE_UOPS,IDQ.MS_SWITCHES,IDQ.MS_UOPS,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE,ILD_STALL.LCP,INT_MISC.CLEAR_RESTEER_CYCLES,LSD.CYCLES_4_UOPS,LSD.CYCLES_ACTIVE,LSD.UOPS,MACHINE_CLEARS.COUNT,CYCLE_ACTIVITY.STALLS_L1D_MISS,CYCLE_ACTIVITY.STALLS_L2_MISS,CYCLE_ACTIVITY.STALLS_L3_MISS,DTLB_LOAD_MISSES.STLB_HIT,DTLB_LOAD_MISSES.WALK_ACTIVE,DTLB_STORE_MISSES.STLB_HIT,DTLB_STORE_MISSES.WALK_ACTIVE,L1D_PEND_MISS.FB_FULL:cmask=1,L1D_PEND_MISS.PENDING,L2_RQSTS.RFO_HIT,LD_BLOCKS.NO_SR,LD_BLOCKS.STORE_FORWARD,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS,MEM_INST_RETIRED.ALL_STORES_PS,MEM_INST_RETIRED.LOCK_LOADS_PS,MEM_INST_RETIRED.SPLIT_LOADS_PS,MEM_INST_RETIRED.SPLIT_STORES_PS,MEM_INST_RETIRED.STLB_MISS_LOADS_PS,MEM_INST_RETIRED.STLB_MISS_STORES_PS,MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM_PS,MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT_PS,MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS_PS,MEM_LOAD_RETIRED.FB_HIT_PS,MEM_LOAD_RETIRED.L1_HIT_PS,MEM_LOAD_RETIRED.L1_MISS_PS,MEM_LOAD_RETIRED.L2_HIT_PS,MEM_LOAD_RETIRED.L3_HIT_PS,MEM_LOAD_RETIRED.L3_MISS_PS,OFFCORE_REQUESTS_BUFFER.SQ_FULL,OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=4,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO,ARITH.DIVIDER_ACTIVE,FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE,FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE,FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE,FP_ARITH_INST_RETIRED.SCALAR_DOUBLE,FP_ARITH_INST_RETIRED.SCALAR_SINGLE,INST_RETIRED.PREC_DIST,PARTIAL_RAT_STALLS.SCOREBOARD,ROB_MISC_EVENTS.PAUSE_INST,UOPS_DISPATCHED_PORT.PORT_0,UOPS_DISPATCHED_PORT.PORT_1,UOPS_DISPATCHED_PORT.PORT_2,UOPS_DISPATCHED_PORT.PORT_3,UOPS_DISPATCHED_PORT.PORT_4,UOPS_DISPATCHED_PORT.PORT_5,UOPS_DISPATCHED_PORT.PORT_6,UOPS_DISPATCHED_PORT.PORT_7,UOPS_EXECUTED.CORE_CYCLES_GE_1,UOPS_EXECUTED.CORE_CYCLES_GE_2,UOPS_EXECUTED.CORE_CYCLES_GE_3,UOPS_EXECUTED.CORE_CYCLES_NONE,UOPS_EXECUTED.THREAD,UOPS_EXECUTED.X87,FP_ASSIST.ANY,OTHER_ASSISTS.ANY,OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM --target-pid 61381

 

 

is there anythising wrong with this cmdline?

0 Kudos
17 Replies
armorhu
Beginner
604 Views

addtional information.

first time run the analysis in linux server i get the error log show below 

armorhu_1-1610608011242.png

and our linux server's cpu code name is haswell

armorhu_2-1610608129035.png

so i remove all unsupport events : 

FRONTEND_RETIRED.LATENCY_GE_8_PS
FRONTEND_RETIRED.DSB_MISS_PS
FRONTEND_RETIRED.L2_MISS_PS
FRONTEND_RETIRED.LATENCY_GE_1
FRONTEND_RETIRED.LATENCY_GE_16_PS
FRONTEND_RETIRED.LATENCY_GE_2
FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS
FRONTEND_RETIRED.STLB_MISS_PS

So, is this events necessary for back-end-bound detail?

if necessary。what event can use in this linux server which cpu code name is haswell?

 

 

 

armorhu
Beginner
599 Views

Sorry。there is a mistake i replay above。

the error log i get run default cmd line is this:

armorhu_1-1610611140684.png

so,how can i fix this error?

 

Kirill_U_Intel
Employee
594 Views

Hi.

Looks like some issue with driverless collection configuration.

Could you try to build and install vtune drivers and check collection again? is event issue still here?

https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/set-up-analysis-ta...

Kirill

armorhu
Beginner
571 Views
Kirill_U_Intel
Employee
561 Views

Could you provide '<vtune_install_dir>/bin64/sep -version' output?

Kirill

ArunJ_Intel
Moderator
548 Views

Hi armorhu,



When you said "yes,problem is same here" did you mean you are facing the same issue even after loading drivers. If yes could you provide the output of the below commands.


$ cd <install-dir>/sepdk/src

$ ./insmod-sep -q


Could you also let us know the operating system and version you are using ,in addition to sep version as already requested by Kirill.



Thanks

Arun




ArunJ_Intel
Moderator
525 Views

Hi armorhu,


We are waiting for an update from your end.


Thanks

Arun Jose


armorhu
Beginner
514 Views

>cat /etc/lsb-releases
[root@TENCENT64site ~]# cat /etc/lsb-releases
cat: /etc/lsb-releases: No such file or directory

>lscpu
[root@TENCENT64site ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
Stepping: 2
CPU MHz: 2301.000
BogoMIPS: 4590.46
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47

>uname -a
[root@TENCENT64site ~]# uname -a
Linux TENCENT64site 3.10.107-1-tlinux2-0052 #1 SMP Wed Jan 15 20:02:40 CST 2020 x86_64 x86_64 x86_64 GNU/Linux

>lsmod |grep sep
[root@TENCENT64site ~]# lsmod |grep sep
sep5 880621 0
socperf3 595104 2 sep5,socwatch2_12

>vtune --version
[root@TENCENT64site ~]# vtune --version
Intel(R) VTune(TM) Profiler 2020 Update 2 (build 610396) Command Line Tool
Copyright (C) 2009-2020 Intel Corporation. All rights reserved.

>./insmod-sep -q
[root@TENCENT64site /opt/intel/vtune_profiler_2020.2.0.610396/sepdk/src]# ./insmod-sep -q
pax driver is loaded and owned by group "root" with file permissions "660".
socperf3 driver is loaded and owned by group "root" with file permissions "660".
sep5 driver is loaded and owned by group "root" with file permissions "660".
socwatch driver is loaded.
vtsspp driver is loaded and owned by group "root" with file permissions "660".

Sorry for my late reply。

And i have some addintional information:I success get Back-end detail data on anthor linux server。

which cpu is :

Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz

Kirill_U_Intel
Employee
504 Views

Could you provide '<vtune_install_dir>/bin64/sep -version' output?

Kirill

armorhu
Beginner
498 Views

[root@TENCENT64site /opt/intel/oneapi/vtune/2021.1.1/bin64]# ./sep -version
Sampling Enabling Product Version: 5.22 Beta built on Nov 10 2020 18:15:43
SEP Driver Version: 5.22 Beta (public)
PAX Driver Version: 1.0
Platform type: 99
CPU name: Intel(R) Xeon(R) E5/E7 v3 Processor code named Haswell
PMU: haswell_server
Driver configs: Non-Maskable Interrupt, REGISTER CHECK ON
Copyright(C) 2007-2020 Intel Corporation. All rights reserved.

Kirill_U_Intel
Employee
493 Views

could you try these events to collect ue?

vtune -collect-with runsa -knob enable-stack-collection=true -knob stack-size=0 -knob enable-user-tasks=true -knob dram-bandwidth-limits=true -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=2000000,CPU_CLK_UNHALTED.REF_TSC:sample:sa=2000000,INST_RETIRED.ANY:sample:sa=2000000,CPU_CLK_UNHALTED.REF_XCLK:sa=100003,CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:sa=100003,CPU_CLK_UNHALTED.THREAD_P,CYCLE_ACTIVITY.CYCLES_NO_EXECUTE,CYCLE_ACTIVITY.STALLS_LDM_PENDING,IDQ_UOPS_NOT_DELIVERED.CORE,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE,INT_MISC.RECOVERY_CYCLES,RESOURCE_STALLS.SB,RS_EVENTS.EMPTY_CYCLES,RS_EVENTS.EMPTY_END,UOPS_EXECUTED.CORE:cmask=1,UOPS_EXECUTED.CORE:cmask=2,UOPS_EXECUTED.CORE:cmask=3,UOPS_ISSUED.ANY,UOPS_RETIRED.RETIRE_SLOTS,BACLEARS.ANY,BR_MISP_RETIRED.ALL_BRANCHES_PS,DSB2MITE_SWITCHES.PENALTY_CYCLES,ICACHE.IFDATA_STALL,IDQ.ALL_DSB_CYCLES_4_UOPS,IDQ.ALL_DSB_CYCLES_ANY_UOPS,IDQ.ALL_MITE_CYCLES_4_UOPS,IDQ.ALL_MITE_CYCLES_ANY_UOPS,IDQ.DSB_UOPS,IDQ.MITE_UOPS,IDQ.MS_SWITCHES,IDQ.MS_UOPS,ILD_STALL.LCP,ITLB_MISSES.STLB_HIT,ITLB_MISSES.WALK_COMPLETED,ITLB_MISSES.WALK_DURATION,LSD.UOPS,MACHINE_CLEARS.COUNT,CYCLE_ACTIVITY.STALLS_L1D_PENDING,CYCLE_ACTIVITY.STALLS_L2_PENDING,DTLB_LOAD_MISSES.STLB_HIT,DTLB_LOAD_MISSES.WALK_DURATION,DTLB_STORE_MISSES.STLB_HIT,DTLB_STORE_MISSES.WALK_DURATION,L1D_PEND_MISS.PENDING,L1D_PEND_MISS.REQUEST_FB_FULL:cmask=1,L2_RQSTS.RFO_HIT,LD_BLOCKS.NO_SR,LD_BLOCKS.STORE_FORWARD,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS,MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS,MEM_LOAD_UOPS_RETIRED.L1_HIT_PS,MEM_LOAD_UOPS_RETIRED.L1_MISS,MEM_LOAD_UOPS_RETIRED.L2_HIT_PS,MEM_LOAD_UOPS_RETIRED.L3_HIT_PS,MEM_LOAD_UOPS_RETIRED.L3_MISS_PS,MEM_UOPS_RETIRED.ALL_STORES_PS,MEM_UOPS_RETIRED.LOCK_LOADS_PS,MEM_UOPS_RETIRED.SPLIT_LOADS_PS,MEM_UOPS_RETIRED.SPLIT_STORES_PS,MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS,MEM_UOPS_RETIRED.STLB_MISS_STORES_PS,OFFCORE_REQUESTS_BUFFER.SQ_FULL,OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=6,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO,ARITH.DIVIDER_UOPS,INST_RETIRED.PREC_DIST,UOPS_DISPATCHED_PORT.PORT_0,UOPS_DISPATCHED_PORT.PORT_1,UOPS_DISPATCHED_PORT.PORT_2,UOPS_DISPATCHED_PORT.PORT_3,UOPS_DISPATCHED_PORT.PORT_4,UOPS_DISPATCHED_PORT.PORT_5,UOPS_DISPATCHED_PORT.PORT_6,UOPS_DISPATCHED_PORT.PORT_7,OTHER_ASSISTS.ANY_WB_ASSIST,OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.HITM_OTHER_CORE --target-pid 61381

ArunJ_Intel
Moderator
452 Views

Hi armorhu,


Have you tried out collecting hardware events as suggested by Kirill.



Thanks

Arun


armorhu
Beginner
439 Views

Sorry for my late reply。

In fact,we have slove this problem by change anthoer linux machine。Thank you for your help。

And I have anthor question。

With a given cpu ,Eg:my pc cpu :Intel Core i7-8759H CPU 。

How can i get the deep information of it? Like,L1 cache access latency,L2 cache access latency,CPU SMT Support? CPU superscalar Support (and support num)?

ArunJ_Intel
Moderator
408 Views

Hi 


L1 l2 cache latency of your cpu might depend on many features. It might depend even on your application so it would be difficult to get you exact numbers for your CPU. If you are looking to find out the hardware performance limits for your application, Intel Advisors roofline analysis could be something that you could look at. You could find more information about roofline analysis in the below link.



https://software.intel.com/content/www/us/en/develop/documentation/advisor-tutorial-roofline/top/run...



To find specifications about your cpu you could search for your cpu model by visiting ark.intel.com. you could just google the cpu model name. As an eg please find below details of i7-8750h


https://ark.intel.com/content/www/us/en/ark/products/134906/intel-core-i7-8750h-processor-9m-cache-u...



The i7 core should definitely support SMT( Hyper-Threading is Intel’s brand name for this technology ). You could check this detail in ark.intel.com for your cpu.


We are not quite sure what you meant by superscalar support and support number. Would need more details to answer that. However as your initial issue is resolved and this discussion is moving farther away from the original issue, we would appreciate it, if you could raise these questions as a separate thread as this could be handled separately and would attract attention of subject matter experts from community who might be able to comment on these with their points.



Thanks

Arun



armorhu
Beginner
401 Views

We are not quite sure what you meant by superscalar support and support number. 

-- superscalar number in my idea is the uop number of one cpu core can process simultaneously。

which as i know ,with some  intel cpu , the number is 4.

and of course,i will raise questions as a separate thread after this last question...

 

ArunJ_Intel
Moderator
382 Views

Thanks for clarifying about . Could you please confirm if we can close this thread as you are planning to raise a separate thread for your further queries.


ArunJ_Intel
Moderator
368 Views


Hi


As your issue is resolved and you are raising a new thread for your other issues. We would not be monitoring this thread any further. Any further updates to this thread would be considered community.


Thanks

Arun


Reply