Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5104 Discussions

VTune Profiler BackEnd-Bound no Detail

armorhu
Beginner
3,087 Views

hi, i have some problem when using vtune profiler in linux .

i  run the microarchitecture exploration type snapshot,but i the data of backend-bound。there is no detail info blew it。

this is i suppose to get (run on my pc)

armorhu_0-1610600632944.png

but this is what i get(run on the linux server)

armorhu_1-1610600721688.png

here is my vtune cmdline:

vtune -collect-with runsa -knob enable-stack-collection=true -knob stack-size=0 -knob enable-user-tasks=true -knob dram-bandwidth-limits=true -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=3000000,CPU_CLK_UNHALTED.REF_TSC:sample:sa=3000000,INST_RETIRED.ANY:sample:sa=3000000,CPU_CLK_UNHALTED.REF_XCLK:sa=100003,CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:sa=100003,CPU_CLK_UNHALTED.THREAD_P,CYCLE_ACTIVITY.STALLS_MEM_ANY,EXE_ACTIVITY.1_PORTS_UTIL,EXE_ACTIVITY.2_PORTS_UTIL,EXE_ACTIVITY.BOUND_ON_STORES,EXE_ACTIVITY.EXE_BOUND_0_PORTS,IDQ_UOPS_NOT_DELIVERED.CORE,INT_MISC.RECOVERY_CYCLES,UOPS_ISSUED.ANY,UOPS_RETIRED.RETIRE_SLOTS,BACLEARS.ANY,BR_MISP_RETIRED.ALL_BRANCHES_PS,DSB2MITE_SWITCHES.PENALTY_CYCLES,ICACHE_16B.IFDATA_STALL,ICACHE_16B.IFDATA_STALL:cmask=1:e=yes,ICACHE_64B.IFTAG_STALL,IDQ.ALL_DSB_CYCLES_4_UOPS,IDQ.ALL_DSB_CYCLES_ANY_UOPS,IDQ.ALL_MITE_CYCLES_4_UOPS,IDQ.ALL_MITE_CYCLES_ANY_UOPS,IDQ.DSB_UOPS,IDQ.MITE_UOPS,IDQ.MS_SWITCHES,IDQ.MS_UOPS,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE,ILD_STALL.LCP,INT_MISC.CLEAR_RESTEER_CYCLES,LSD.CYCLES_4_UOPS,LSD.CYCLES_ACTIVE,LSD.UOPS,MACHINE_CLEARS.COUNT,CYCLE_ACTIVITY.STALLS_L1D_MISS,CYCLE_ACTIVITY.STALLS_L2_MISS,CYCLE_ACTIVITY.STALLS_L3_MISS,DTLB_LOAD_MISSES.STLB_HIT,DTLB_LOAD_MISSES.WALK_ACTIVE,DTLB_STORE_MISSES.STLB_HIT,DTLB_STORE_MISSES.WALK_ACTIVE,L1D_PEND_MISS.FB_FULL:cmask=1,L1D_PEND_MISS.PENDING,L2_RQSTS.RFO_HIT,LD_BLOCKS.NO_SR,LD_BLOCKS.STORE_FORWARD,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS,MEM_INST_RETIRED.ALL_STORES_PS,MEM_INST_RETIRED.LOCK_LOADS_PS,MEM_INST_RETIRED.SPLIT_LOADS_PS,MEM_INST_RETIRED.SPLIT_STORES_PS,MEM_INST_RETIRED.STLB_MISS_LOADS_PS,MEM_INST_RETIRED.STLB_MISS_STORES_PS,MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM_PS,MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT_PS,MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS_PS,MEM_LOAD_RETIRED.FB_HIT_PS,MEM_LOAD_RETIRED.L1_HIT_PS,MEM_LOAD_RETIRED.L1_MISS_PS,MEM_LOAD_RETIRED.L2_HIT_PS,MEM_LOAD_RETIRED.L3_HIT_PS,MEM_LOAD_RETIRED.L3_MISS_PS,OFFCORE_REQUESTS_BUFFER.SQ_FULL,OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=4,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO,ARITH.DIVIDER_ACTIVE,FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE,FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE,FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE,FP_ARITH_INST_RETIRED.SCALAR_DOUBLE,FP_ARITH_INST_RETIRED.SCALAR_SINGLE,INST_RETIRED.PREC_DIST,PARTIAL_RAT_STALLS.SCOREBOARD,ROB_MISC_EVENTS.PAUSE_INST,UOPS_DISPATCHED_PORT.PORT_0,UOPS_DISPATCHED_PORT.PORT_1,UOPS_DISPATCHED_PORT.PORT_2,UOPS_DISPATCHED_PORT.PORT_3,UOPS_DISPATCHED_PORT.PORT_4,UOPS_DISPATCHED_PORT.PORT_5,UOPS_DISPATCHED_PORT.PORT_6,UOPS_DISPATCHED_PORT.PORT_7,UOPS_EXECUTED.CORE_CYCLES_GE_1,UOPS_EXECUTED.CORE_CYCLES_GE_2,UOPS_EXECUTED.CORE_CYCLES_GE_3,UOPS_EXECUTED.CORE_CYCLES_NONE,UOPS_EXECUTED.THREAD,UOPS_EXECUTED.X87,FP_ASSIST.ANY,OTHER_ASSISTS.ANY,OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM --target-pid 61381

 

 

is there anythising wrong with this cmdline?

0 Kudos
17 Replies
armorhu
Beginner
3,076 Views

addtional information.

first time run the analysis in linux server i get the error log show below 

armorhu_1-1610608011242.png

and our linux server's cpu code name is haswell

armorhu_2-1610608129035.png

so i remove all unsupport events : 

FRONTEND_RETIRED.LATENCY_GE_8_PS
FRONTEND_RETIRED.DSB_MISS_PS
FRONTEND_RETIRED.L2_MISS_PS
FRONTEND_RETIRED.LATENCY_GE_1
FRONTEND_RETIRED.LATENCY_GE_16_PS
FRONTEND_RETIRED.LATENCY_GE_2
FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS
FRONTEND_RETIRED.STLB_MISS_PS

So, is this events necessary for back-end-bound detail?

if necessary。what event can use in this linux server which cpu code name is haswell?

 

 

 

0 Kudos
armorhu
Beginner
3,071 Views

Sorry。there is a mistake i replay above。

the error log i get run default cmd line is this:

armorhu_1-1610611140684.png

so,how can i fix this error?

 

0 Kudos
Kirill_U_Intel
Employee
3,066 Views

Hi.

Looks like some issue with driverless collection configuration.

Could you try to build and install vtune drivers and check collection again? is event issue still here?

https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/set-up-analysis-target/linux-targets/building-and-installing-the-sampling-drivers-for-linux-targets.html

Kirill

0 Kudos
armorhu
Beginner
3,043 Views
0 Kudos
Kirill_U_Intel
Employee
3,033 Views

Could you provide '<vtune_install_dir>/bin64/sep -version' output?

Kirill

0 Kudos
ArunJ_Intel
Moderator
3,020 Views

Hi armorhu,



When you said "yes,problem is same here" did you mean you are facing the same issue even after loading drivers. If yes could you provide the output of the below commands.


$ cd <install-dir>/sepdk/src

$ ./insmod-sep -q


Could you also let us know the operating system and version you are using ,in addition to sep version as already requested by Kirill.



Thanks

Arun




0 Kudos
ArunJ_Intel
Moderator
2,997 Views

Hi armorhu,


We are waiting for an update from your end.


Thanks

Arun Jose


0 Kudos
armorhu
Beginner
2,986 Views

>cat /etc/lsb-releases
[root@TENCENT64site ~]# cat /etc/lsb-releases
cat: /etc/lsb-releases: No such file or directory

>lscpu
[root@TENCENT64site ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
Stepping: 2
CPU MHz: 2301.000
BogoMIPS: 4590.46
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47

>uname -a
[root@TENCENT64site ~]# uname -a
Linux TENCENT64site 3.10.107-1-tlinux2-0052 #1 SMP Wed Jan 15 20:02:40 CST 2020 x86_64 x86_64 x86_64 GNU/Linux

>lsmod |grep sep
[root@TENCENT64site ~]# lsmod |grep sep
sep5 880621 0
socperf3 595104 2 sep5,socwatch2_12

>vtune --version
[root@TENCENT64site ~]# vtune --version
Intel(R) VTune(TM) Profiler 2020 Update 2 (build 610396) Command Line Tool
Copyright (C) 2009-2020 Intel Corporation. All rights reserved.

>./insmod-sep -q
[root@TENCENT64site /opt/intel/vtune_profiler_2020.2.0.610396/sepdk/src]# ./insmod-sep -q
pax driver is loaded and owned by group "root" with file permissions "660".
socperf3 driver is loaded and owned by group "root" with file permissions "660".
sep5 driver is loaded and owned by group "root" with file permissions "660".
socwatch driver is loaded.
vtsspp driver is loaded and owned by group "root" with file permissions "660".

Sorry for my late reply。

And i have some addintional information:I success get Back-end detail data on anthor linux server。

which cpu is :

Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz

0 Kudos
Kirill_U_Intel
Employee
2,976 Views

Could you provide '<vtune_install_dir>/bin64/sep -version' output?

Kirill

0 Kudos
armorhu
Beginner
2,970 Views

[root@TENCENT64site /opt/intel/oneapi/vtune/2021.1.1/bin64]# ./sep -version
Sampling Enabling Product Version: 5.22 Beta built on Nov 10 2020 18:15:43
SEP Driver Version: 5.22 Beta (public)
PAX Driver Version: 1.0
Platform type: 99
CPU name: Intel(R) Xeon(R) E5/E7 v3 Processor code named Haswell
PMU: haswell_server
Driver configs: Non-Maskable Interrupt, REGISTER CHECK ON
Copyright(C) 2007-2020 Intel Corporation. All rights reserved.

0 Kudos
Kirill_U_Intel
Employee
2,965 Views

could you try these events to collect ue?

vtune -collect-with runsa -knob enable-stack-collection=true -knob stack-size=0 -knob enable-user-tasks=true -knob dram-bandwidth-limits=true -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=2000000,CPU_CLK_UNHALTED.REF_TSC:sample:sa=2000000,INST_RETIRED.ANY:sample:sa=2000000,CPU_CLK_UNHALTED.REF_XCLK:sa=100003,CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:sa=100003,CPU_CLK_UNHALTED.THREAD_P,CYCLE_ACTIVITY.CYCLES_NO_EXECUTE,CYCLE_ACTIVITY.STALLS_LDM_PENDING,IDQ_UOPS_NOT_DELIVERED.CORE,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE,INT_MISC.RECOVERY_CYCLES,RESOURCE_STALLS.SB,RS_EVENTS.EMPTY_CYCLES,RS_EVENTS.EMPTY_END,UOPS_EXECUTED.CORE:cmask=1,UOPS_EXECUTED.CORE:cmask=2,UOPS_EXECUTED.CORE:cmask=3,UOPS_ISSUED.ANY,UOPS_RETIRED.RETIRE_SLOTS,BACLEARS.ANY,BR_MISP_RETIRED.ALL_BRANCHES_PS,DSB2MITE_SWITCHES.PENALTY_CYCLES,ICACHE.IFDATA_STALL,IDQ.ALL_DSB_CYCLES_4_UOPS,IDQ.ALL_DSB_CYCLES_ANY_UOPS,IDQ.ALL_MITE_CYCLES_4_UOPS,IDQ.ALL_MITE_CYCLES_ANY_UOPS,IDQ.DSB_UOPS,IDQ.MITE_UOPS,IDQ.MS_SWITCHES,IDQ.MS_UOPS,ILD_STALL.LCP,ITLB_MISSES.STLB_HIT,ITLB_MISSES.WALK_COMPLETED,ITLB_MISSES.WALK_DURATION,LSD.UOPS,MACHINE_CLEARS.COUNT,CYCLE_ACTIVITY.STALLS_L1D_PENDING,CYCLE_ACTIVITY.STALLS_L2_PENDING,DTLB_LOAD_MISSES.STLB_HIT,DTLB_LOAD_MISSES.WALK_DURATION,DTLB_STORE_MISSES.STLB_HIT,DTLB_STORE_MISSES.WALK_DURATION,L1D_PEND_MISS.PENDING,L1D_PEND_MISS.REQUEST_FB_FULL:cmask=1,L2_RQSTS.RFO_HIT,LD_BLOCKS.NO_SR,LD_BLOCKS.STORE_FORWARD,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS,MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS,MEM_LOAD_UOPS_RETIRED.L1_HIT_PS,MEM_LOAD_UOPS_RETIRED.L1_MISS,MEM_LOAD_UOPS_RETIRED.L2_HIT_PS,MEM_LOAD_UOPS_RETIRED.L3_HIT_PS,MEM_LOAD_UOPS_RETIRED.L3_MISS_PS,MEM_UOPS_RETIRED.ALL_STORES_PS,MEM_UOPS_RETIRED.LOCK_LOADS_PS,MEM_UOPS_RETIRED.SPLIT_LOADS_PS,MEM_UOPS_RETIRED.SPLIT_STORES_PS,MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS,MEM_UOPS_RETIRED.STLB_MISS_STORES_PS,OFFCORE_REQUESTS_BUFFER.SQ_FULL,OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=6,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO,ARITH.DIVIDER_UOPS,INST_RETIRED.PREC_DIST,UOPS_DISPATCHED_PORT.PORT_0,UOPS_DISPATCHED_PORT.PORT_1,UOPS_DISPATCHED_PORT.PORT_2,UOPS_DISPATCHED_PORT.PORT_3,UOPS_DISPATCHED_PORT.PORT_4,UOPS_DISPATCHED_PORT.PORT_5,UOPS_DISPATCHED_PORT.PORT_6,UOPS_DISPATCHED_PORT.PORT_7,OTHER_ASSISTS.ANY_WB_ASSIST,OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.HITM_OTHER_CORE --target-pid 61381

0 Kudos
ArunJ_Intel
Moderator
2,924 Views

Hi armorhu,


Have you tried out collecting hardware events as suggested by Kirill.



Thanks

Arun


0 Kudos
armorhu
Beginner
2,911 Views

Sorry for my late reply。

In fact,we have slove this problem by change anthoer linux machine。Thank you for your help。

And I have anthor question。

With a given cpu ,Eg:my pc cpu :Intel Core i7-8759H CPU 。

How can i get the deep information of it? Like,L1 cache access latency,L2 cache access latency,CPU SMT Support? CPU superscalar Support (and support num)?

0 Kudos
ArunJ_Intel
Moderator
2,880 Views

Hi 


L1 l2 cache latency of your cpu might depend on many features. It might depend even on your application so it would be difficult to get you exact numbers for your CPU. If you are looking to find out the hardware performance limits for your application, Intel Advisors roofline analysis could be something that you could look at. You could find more information about roofline analysis in the below link.



https://software.intel.com/content/www/us/en/develop/documentation/advisor-tutorial-roofline/top/run-a-roofline-analysis.html



To find specifications about your cpu you could search for your cpu model by visiting ark.intel.com. you could just google the cpu model name. As an eg please find below details of i7-8750h


https://ark.intel.com/content/www/us/en/ark/products/134906/intel-core-i7-8750h-processor-9m-cache-up-to-4-10-ghz.html



The i7 core should definitely support SMT( Hyper-Threading is Intel’s brand name for this technology ). You could check this detail in ark.intel.com for your cpu.


We are not quite sure what you meant by superscalar support and support number. Would need more details to answer that. However as your initial issue is resolved and this discussion is moving farther away from the original issue, we would appreciate it, if you could raise these questions as a separate thread as this could be handled separately and would attract attention of subject matter experts from community who might be able to comment on these with their points.



Thanks

Arun



0 Kudos
armorhu
Beginner
2,873 Views

We are not quite sure what you meant by superscalar support and support number. 

-- superscalar number in my idea is the uop number of one cpu core can process simultaneously。

which as i know ,with some  intel cpu , the number is 4.

and of course,i will raise questions as a separate thread after this last question...

 

0 Kudos
ArunJ_Intel
Moderator
2,854 Views

Thanks for clarifying about . Could you please confirm if we can close this thread as you are planning to raise a separate thread for your further queries.


0 Kudos
ArunJ_Intel
Moderator
2,840 Views


Hi


As your issue is resolved and you are raising a new thread for your other issues. We would not be monitoring this thread any further. Any further updates to this thread would be considered community.


Thanks

Arun


0 Kudos
Reply