- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi, i have some problem when using vtune profiler in linux .
i run the microarchitecture exploration type snapshot,but i the data of backend-bound。there is no detail info blew it。
this is i suppose to get (run on my pc)
but this is what i get(run on the linux server)
here is my vtune cmdline:
vtune -collect-with runsa -knob enable-stack-collection=true -knob stack-size=0 -knob enable-user-tasks=true -knob dram-bandwidth-limits=true -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=3000000,CPU_CLK_UNHALTED.REF_TSC:sample:sa=3000000,INST_RETIRED.ANY:sample:sa=3000000,CPU_CLK_UNHALTED.REF_XCLK:sa=100003,CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:sa=100003,CPU_CLK_UNHALTED.THREAD_P,CYCLE_ACTIVITY.STALLS_MEM_ANY,EXE_ACTIVITY.1_PORTS_UTIL,EXE_ACTIVITY.2_PORTS_UTIL,EXE_ACTIVITY.BOUND_ON_STORES,EXE_ACTIVITY.EXE_BOUND_0_PORTS,IDQ_UOPS_NOT_DELIVERED.CORE,INT_MISC.RECOVERY_CYCLES,UOPS_ISSUED.ANY,UOPS_RETIRED.RETIRE_SLOTS,BACLEARS.ANY,BR_MISP_RETIRED.ALL_BRANCHES_PS,DSB2MITE_SWITCHES.PENALTY_CYCLES,ICACHE_16B.IFDATA_STALL,ICACHE_16B.IFDATA_STALL:cmask=1:e=yes,ICACHE_64B.IFTAG_STALL,IDQ.ALL_DSB_CYCLES_4_UOPS,IDQ.ALL_DSB_CYCLES_ANY_UOPS,IDQ.ALL_MITE_CYCLES_4_UOPS,IDQ.ALL_MITE_CYCLES_ANY_UOPS,IDQ.DSB_UOPS,IDQ.MITE_UOPS,IDQ.MS_SWITCHES,IDQ.MS_UOPS,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE,ILD_STALL.LCP,INT_MISC.CLEAR_RESTEER_CYCLES,LSD.CYCLES_4_UOPS,LSD.CYCLES_ACTIVE,LSD.UOPS,MACHINE_CLEARS.COUNT,CYCLE_ACTIVITY.STALLS_L1D_MISS,CYCLE_ACTIVITY.STALLS_L2_MISS,CYCLE_ACTIVITY.STALLS_L3_MISS,DTLB_LOAD_MISSES.STLB_HIT,DTLB_LOAD_MISSES.WALK_ACTIVE,DTLB_STORE_MISSES.STLB_HIT,DTLB_STORE_MISSES.WALK_ACTIVE,L1D_PEND_MISS.FB_FULL:cmask=1,L1D_PEND_MISS.PENDING,L2_RQSTS.RFO_HIT,LD_BLOCKS.NO_SR,LD_BLOCKS.STORE_FORWARD,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS,MEM_INST_RETIRED.ALL_STORES_PS,MEM_INST_RETIRED.LOCK_LOADS_PS,MEM_INST_RETIRED.SPLIT_LOADS_PS,MEM_INST_RETIRED.SPLIT_STORES_PS,MEM_INST_RETIRED.STLB_MISS_LOADS_PS,MEM_INST_RETIRED.STLB_MISS_STORES_PS,MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM_PS,MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT_PS,MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS_PS,MEM_LOAD_RETIRED.FB_HIT_PS,MEM_LOAD_RETIRED.L1_HIT_PS,MEM_LOAD_RETIRED.L1_MISS_PS,MEM_LOAD_RETIRED.L2_HIT_PS,MEM_LOAD_RETIRED.L3_HIT_PS,MEM_LOAD_RETIRED.L3_MISS_PS,OFFCORE_REQUESTS_BUFFER.SQ_FULL,OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=4,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO,ARITH.DIVIDER_ACTIVE,FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE,FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE,FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE,FP_ARITH_INST_RETIRED.SCALAR_DOUBLE,FP_ARITH_INST_RETIRED.SCALAR_SINGLE,INST_RETIRED.PREC_DIST,PARTIAL_RAT_STALLS.SCOREBOARD,ROB_MISC_EVENTS.PAUSE_INST,UOPS_DISPATCHED_PORT.PORT_0,UOPS_DISPATCHED_PORT.PORT_1,UOPS_DISPATCHED_PORT.PORT_2,UOPS_DISPATCHED_PORT.PORT_3,UOPS_DISPATCHED_PORT.PORT_4,UOPS_DISPATCHED_PORT.PORT_5,UOPS_DISPATCHED_PORT.PORT_6,UOPS_DISPATCHED_PORT.PORT_7,UOPS_EXECUTED.CORE_CYCLES_GE_1,UOPS_EXECUTED.CORE_CYCLES_GE_2,UOPS_EXECUTED.CORE_CYCLES_GE_3,UOPS_EXECUTED.CORE_CYCLES_NONE,UOPS_EXECUTED.THREAD,UOPS_EXECUTED.X87,FP_ASSIST.ANY,OTHER_ASSISTS.ANY,OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM --target-pid 61381
is there anythising wrong with this cmdline?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
addtional information.
first time run the analysis in linux server i get the error log show below
and our linux server's cpu code name is haswell
so i remove all unsupport events :
FRONTEND_RETIRED.LATENCY_GE_8_PS
FRONTEND_RETIRED.DSB_MISS_PS
FRONTEND_RETIRED.L2_MISS_PS
FRONTEND_RETIRED.LATENCY_GE_1
FRONTEND_RETIRED.LATENCY_GE_16_PS
FRONTEND_RETIRED.LATENCY_GE_2
FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS
FRONTEND_RETIRED.STLB_MISS_PS
So, is this events necessary for back-end-bound detail?
if necessary。what event can use in this linux server which cpu code name is haswell?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry。there is a mistake i replay above。
the error log i get run default cmd line is this:
so,how can i fix this error?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
Looks like some issue with driverless collection configuration.
Could you try to build and install vtune drivers and check collection again? is event issue still here?
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you provide '<vtune_install_dir>/bin64/sep -version' output?
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi armorhu,
When you said "yes,problem is same here" did you mean you are facing the same issue even after loading drivers. If yes could you provide the output of the below commands.
$ cd <install-dir>/sepdk/src
$ ./insmod-sep -q
Could you also let us know the operating system and version you are using ,in addition to sep version as already requested by Kirill.
Thanks
Arun
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi armorhu,
We are waiting for an update from your end.
Thanks
Arun Jose
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>cat /etc/lsb-releases
[root@TENCENT64site ~]# cat /etc/lsb-releases
cat: /etc/lsb-releases: No such file or directory
>lscpu
[root@TENCENT64site ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
Stepping: 2
CPU MHz: 2301.000
BogoMIPS: 4590.46
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47
>uname -a
[root@TENCENT64site ~]# uname -a
Linux TENCENT64site 3.10.107-1-tlinux2-0052 #1 SMP Wed Jan 15 20:02:40 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
>lsmod |grep sep
[root@TENCENT64site ~]# lsmod |grep sep
sep5 880621 0
socperf3 595104 2 sep5,socwatch2_12
>vtune --version
[root@TENCENT64site ~]# vtune --version
Intel(R) VTune(TM) Profiler 2020 Update 2 (build 610396) Command Line Tool
Copyright (C) 2009-2020 Intel Corporation. All rights reserved.
>./insmod-sep -q
[root@TENCENT64site /opt/intel/vtune_profiler_2020.2.0.610396/sepdk/src]# ./insmod-sep -q
pax driver is loaded and owned by group "root" with file permissions "660".
socperf3 driver is loaded and owned by group "root" with file permissions "660".
sep5 driver is loaded and owned by group "root" with file permissions "660".
socwatch driver is loaded.
vtsspp driver is loaded and owned by group "root" with file permissions "660".
Sorry for my late reply。
And i have some addintional information:I success get Back-end detail data on anthor linux server。
which cpu is :
Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you provide '<vtune_install_dir>/bin64/sep -version' output?
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[root@TENCENT64site /opt/intel/oneapi/vtune/2021.1.1/bin64]# ./sep -version
Sampling Enabling Product Version: 5.22 Beta built on Nov 10 2020 18:15:43
SEP Driver Version: 5.22 Beta (public)
PAX Driver Version: 1.0
Platform type: 99
CPU name: Intel(R) Xeon(R) E5/E7 v3 Processor code named Haswell
PMU: haswell_server
Driver configs: Non-Maskable Interrupt, REGISTER CHECK ON
Copyright(C) 2007-2020 Intel Corporation. All rights reserved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
could you try these events to collect ue?
vtune -collect-with runsa -knob enable-stack-collection=true -knob stack-size=0 -knob enable-user-tasks=true -knob dram-bandwidth-limits=true -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=2000000,CPU_CLK_UNHALTED.REF_TSC:sample:sa=2000000,INST_RETIRED.ANY:sample:sa=2000000,CPU_CLK_UNHALTED.REF_XCLK:sa=100003,CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:sa=100003,CPU_CLK_UNHALTED.THREAD_P,CYCLE_ACTIVITY.CYCLES_NO_EXECUTE,CYCLE_ACTIVITY.STALLS_LDM_PENDING,IDQ_UOPS_NOT_DELIVERED.CORE,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE,INT_MISC.RECOVERY_CYCLES,RESOURCE_STALLS.SB,RS_EVENTS.EMPTY_CYCLES,RS_EVENTS.EMPTY_END,UOPS_EXECUTED.CORE:cmask=1,UOPS_EXECUTED.CORE:cmask=2,UOPS_EXECUTED.CORE:cmask=3,UOPS_ISSUED.ANY,UOPS_RETIRED.RETIRE_SLOTS,BACLEARS.ANY,BR_MISP_RETIRED.ALL_BRANCHES_PS,DSB2MITE_SWITCHES.PENALTY_CYCLES,ICACHE.IFDATA_STALL,IDQ.ALL_DSB_CYCLES_4_UOPS,IDQ.ALL_DSB_CYCLES_ANY_UOPS,IDQ.ALL_MITE_CYCLES_4_UOPS,IDQ.ALL_MITE_CYCLES_ANY_UOPS,IDQ.DSB_UOPS,IDQ.MITE_UOPS,IDQ.MS_SWITCHES,IDQ.MS_UOPS,ILD_STALL.LCP,ITLB_MISSES.STLB_HIT,ITLB_MISSES.WALK_COMPLETED,ITLB_MISSES.WALK_DURATION,LSD.UOPS,MACHINE_CLEARS.COUNT,CYCLE_ACTIVITY.STALLS_L1D_PENDING,CYCLE_ACTIVITY.STALLS_L2_PENDING,DTLB_LOAD_MISSES.STLB_HIT,DTLB_LOAD_MISSES.WALK_DURATION,DTLB_STORE_MISSES.STLB_HIT,DTLB_STORE_MISSES.WALK_DURATION,L1D_PEND_MISS.PENDING,L1D_PEND_MISS.REQUEST_FB_FULL:cmask=1,L2_RQSTS.RFO_HIT,LD_BLOCKS.NO_SR,LD_BLOCKS.STORE_FORWARD,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS,MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS,MEM_LOAD_UOPS_RETIRED.L1_HIT_PS,MEM_LOAD_UOPS_RETIRED.L1_MISS,MEM_LOAD_UOPS_RETIRED.L2_HIT_PS,MEM_LOAD_UOPS_RETIRED.L3_HIT_PS,MEM_LOAD_UOPS_RETIRED.L3_MISS_PS,MEM_UOPS_RETIRED.ALL_STORES_PS,MEM_UOPS_RETIRED.LOCK_LOADS_PS,MEM_UOPS_RETIRED.SPLIT_LOADS_PS,MEM_UOPS_RETIRED.SPLIT_STORES_PS,MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS,MEM_UOPS_RETIRED.STLB_MISS_STORES_PS,OFFCORE_REQUESTS_BUFFER.SQ_FULL,OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=6,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO,ARITH.DIVIDER_UOPS,INST_RETIRED.PREC_DIST,UOPS_DISPATCHED_PORT.PORT_0,UOPS_DISPATCHED_PORT.PORT_1,UOPS_DISPATCHED_PORT.PORT_2,UOPS_DISPATCHED_PORT.PORT_3,UOPS_DISPATCHED_PORT.PORT_4,UOPS_DISPATCHED_PORT.PORT_5,UOPS_DISPATCHED_PORT.PORT_6,UOPS_DISPATCHED_PORT.PORT_7,OTHER_ASSISTS.ANY_WB_ASSIST,OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.HITM_OTHER_CORE --target-pid 61381
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi armorhu,
Have you tried out collecting hardware events as suggested by Kirill.
Thanks
Arun
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for my late reply。
In fact,we have slove this problem by change anthoer linux machine。Thank you for your help。
And I have anthor question。
With a given cpu ,Eg:my pc cpu :Intel Core i7-8759H CPU 。
How can i get the deep information of it? Like,L1 cache access latency,L2 cache access latency,CPU SMT Support? CPU superscalar Support (and support num)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
L1 l2 cache latency of your cpu might depend on many features. It might depend even on your application so it would be difficult to get you exact numbers for your CPU. If you are looking to find out the hardware performance limits for your application, Intel Advisors roofline analysis could be something that you could look at. You could find more information about roofline analysis in the below link.
To find specifications about your cpu you could search for your cpu model by visiting ark.intel.com. you could just google the cpu model name. As an eg please find below details of i7-8750h
The i7 core should definitely support SMT( Hyper-Threading is Intel’s brand name for this technology ). You could check this detail in ark.intel.com for your cpu.
We are not quite sure what you meant by superscalar support and support number. Would need more details to answer that. However as your initial issue is resolved and this discussion is moving farther away from the original issue, we would appreciate it, if you could raise these questions as a separate thread as this could be handled separately and would attract attention of subject matter experts from community who might be able to comment on these with their points.
Thanks
Arun
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are not quite sure what you meant by superscalar support and support number.
-- superscalar number in my idea is the uop number of one cpu core can process simultaneously。
which as i know ,with some intel cpu , the number is 4.
and of course,i will raise questions as a separate thread after this last question...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for clarifying about . Could you please confirm if we can close this thread as you are planning to raise a separate thread for your further queries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
As your issue is resolved and you are raising a new thread for your other issues. We would not be monitoring this thread any further. Any further updates to this thread would be considered community.
Thanks
Arun
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page