- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to use perf to obtain the value of PEBS data linear address and latency value using perf for the MEM_LOAD_UOPS_RETIRED.L3_MISS event (which I think is the LLC miss event). I have used the following command:
perf record -d -e cpu/event=0xd1,umask=0x20/ppu -F 1000 --intr-regs=SI,DI,BP,SP,R8 [application]
This generates the perf.data file. How can I read this file to obtain the data linear address and load latency value? I have tried using this command to dump the generated report in raw format and extract RECORD_SAMPLE from it:
perf report -D -i perf.data | grep RECORD_SAMPLE > sample.txt
I don't completely understand the columns it generates as there are no headers. I have attached few lines from the sample.txt file:
7248128953093548 0x16fa0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 15 addr: 0 7248128962085643 0x16ff8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 16 addr: 0 7248128972080714 0x17050 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707db4 period: 16 addr: 0 7248128980080286 0x170a8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 16 addr: 0 7248128989072422 0x17100 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d6c8 period: 16 addr: 0 7248128998067923 0x17158 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b73a584 period: 16 addr: 0 7248129006067467 0x171b0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 16 addr: 0 7248129016059056 0x17208 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d574 period: 15 addr: 0 7248129024055215 0x17260 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707d30 period: 15 addr: 0 7248129033051029 0x172b8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c4f period: 15 addr: 0 7248129041156459 0x17310 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 15 addr: 0 7248129050042391 0x17368 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5af period: 16 addr: 0 7248129058038570 0x173c0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d582 period: 16 addr: 0 7248129070032626 0x17418 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5a4 period: 16 addr: 0 7248129079028316 0x17470 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707ebf period: 15 addr: 0 7248129088023771 0x174c8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707e89 period: 15 addr: 0 7248129098019481 0x17520 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0 7248129105015588 0x17578 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d6d6 period: 15 addr: 0 7248129118009214 0x175d0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b739482 period: 15 addr: 0 7248129126005649 0x17628 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0 7248129133001958 0x17680 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0 7248129143996535 0x176d8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0 7248129151996795 0x17730 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 15 addr: 0 7248129163986760 0x17788 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0 7248129173166957 0x177e0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0x7fd38b7455ed period: 15 addr: 0 7248129180978406 0x17838 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d589 period: 16 addr: 0 7248129188974446 0x17890 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c2f period: 16 addr: 0 7248129199968956 0x178e8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c93 period: 16 addr: 0 7248129208964678 0x17940 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c23 period: 16 addr: 0 7248129217960457 0x17998 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b75c9dc period: 16 addr: 0 7248129227955422 0x179f0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5b2 period: 16 addr: 0 7248129236950918 0x17a48 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5b2 period: 16 addr: 0 7248129247945633 0x17aa0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 16 addr: 0
Also, the addr value is 0 which should not be the case.
My processor is E5-2650 v3. Please let me know if any additional information is needed.
I had made a mistake while running the command because of which I was getting addr as 0 and I am adding the updated output of sample.txt file:
7271806242314836 0x31ba8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e7 period: 12136 addr: 0x7ffc54f2e7d0 7271806243314179 0x31c10 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e3 period: 12158 addr: 0x7ffc54f2e7c0 7271806244321965 0x31c78 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d6c6 period: 12152 addr: 0x7ffc54f2e808 7271806245308373 0x31ce0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d572 period: 12162 addr: 0x7ffc54f2e80c 7271806246292832 0x31d48 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12202 addr: 0x7ffc54f2e805 7271806247289941 0x31db0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e7 period: 12233 addr: 0x7ffc54f2e7d0 7271806248306006 0x31e18 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12148467 period: 12210 addr: 0x7ffc54f2e7d8 7271806249308119 0x31e80 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116d6c period: 12186 addr: 0x7fcd1249a0c8 7271806250308990 0x31ee8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c08 period: 12190 addr: 0x7fcd1249bcd4 7271806251307427 0x31f50 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd1216b9c5 period: 12191 addr: 0x7fcd137519b4 7271806252297279 0x31fb8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c2f period: 12202 addr: 0x7ffc54f2e7f1 7271806253299296 0x32020 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d7c8 period: 12214 addr: 0x7ffc54f2f110 7271806254304888 0x32088 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121494fc period: 12205 addr: 0x229c4f8 7271806255306048 0x320f0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd1216b99e period: 12200 addr: 0x7fcd13751941 7271806256289251 0x32158 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12217 addr: 0x7ffc54f2e806 7271806257293080 0x321c0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116ce0 period: 12234 addr: 0x7fcd12498dc0 7271806258301450 0x32228 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116d77 period: 12216 addr: 0x7fcd12246b23 7271806259297345 0x32290 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c93 period: 12217 addr: 0x7ffc54f2e80d 7271806260296250 0x322f8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e5 period: 12229 addr: 0x7ffc54f2e7c8 7271806261297185 0x32360 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d572 period: 12225 addr: 0x7ffc54f2e808 7271806262316489 0x323c8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165ca9 period: 12211 addr: 0x7ffc54f2e70c 7271806263329132 0x32430 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c8e period: 12158 addr: 0x7ffc54f2e709 7271806264337550 0x32498 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165ca9 period: 12131 addr: 0x7ffc54f2e711 7271806265349338 0x32500 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d572 period: 12110 addr: 0x7ffc54f2e808 7271806266359123 0x32568 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d7c8 period: 12076 addr: 0x7ffc54f2f110 7271806267338135 0x325d0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd1214838b period: 12090 addr: 0x229c4f0 7271806268335486 0x32638 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c8c period: 12124 addr: 0x7ffc54f2e804 7271806269327045 0x326a0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c9b period: 12145 addr: 0x7ffc54f2e809 7271806270315896 0x32708 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12162 addr: 0x7ffc54f2e807 7271806271319673 0x32770 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165cd5 period: 12176 addr: 0x7ffc54f2e7d8 7271806272325312 0x327d8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12170 addr: 0x7ffc54f2e801
I still don't know how to get the data linear address and load latency from the perf.data file.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used the "perf mem" command to try to get the same information. This uses the PEBS load latency facility to get the information about where the data was located -- as described in the presentation at http://www.paradyn.org/petascale2013/slides/eranian13.pdf
I ran the data collection with something like:
perf mem -t load -C 0-23 -U record -U --ldlat 30 -- ./CacheableReadTest_1GB
The "perf report" can be "raw", which mostly makes sense. In the output below, everything makes sense except for the DSRC field (for which I can find no documentation):
# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL
[...]
256545 256548 0x404ca7 0x2aaac0120fb0 34 0x68100242 CacheableReadTest_1GB:main
256545 256565 0x404c8b 0x2aaac06ba050 211 0x68100242 /CacheableReadTest_1GB:main
256545 256553 0x405071 0x2aaac21a8f40 2018 0x68201042 CacheableReadTest_1GB:ssum
256545 256565 0x40506a 0x2aaac25f7800 1900 0x68201042 CacheableReadTest_1GB:ssum
[...]
The formatted output from "perf report" also mostly makes sense
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 44K of event 'cpu/mem-loads,ldlat=30/P'
# Total weight : 2542544
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked
# ........ ............ ............ ........................ ........................................................................................ ..................... ...................... ..................... ............ ...................... ......
#
[...]
0.00% 1 34 LFB hit [.] main CacheableReadTest_1GB [.] 0x00002aaac0120fb0 perf-256545.map None L1 or L2 hit No
0.01% 1 211 LFB hit [.] main CacheableReadTest_1GB [.] 0x00002aaac06ba050 perf-256545.map None L1 or L2 hit No
0.08% 1 2018 Local RAM hit [.] ssum CacheableReadTest_1GB [.] 0x00002aaac21a8f40 perf-256545.map Miss L1 or L2 hit No
0.07% 1 1900 Local RAM hit [.] ssum CacheableReadTest_1GB [.] 0x00002aaac25f7800 perf-256545.map Miss L1 or L2 hit No
[...]
I can't figure out how to bridge the formatted report results for "Memory access", "Snoop", "TLB Access", "Locked", with the DSRC encodings from the raw output.
I also don't understand why the output is in completely different order for the two report formats -- but at least the "Local Weight" (latency in processor core cycles) is the same for each address in the two versions of the report....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you John. I think local_weight is the load access latency and addr is the data linear address. I have use of only these two fields. I searched for the DSRC and I found this link giving some explanation regarding DSRC. I didn't understand it completely, may be you will have better understanding.
https://stackoverflow.com/questions/33295288/perf-mem-d-report/34689469#34689469 ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have one more question.
When I was using perf record I could use multiple events like in my example I was specifically using it for MEM_LOAD_UOPS_RETIRED.L3_MISS event. I could also specifiy other events like MEM_LOAD_UOPS_RETIRED.L2_MISS or MEM_LOAD_UOPS_RETIRED.L1_MISS. I can't use these events with perf mem record.
The man page of perf mem says I need to search for events using this command:
perf mem record -e list
This gives me only two events:
ldlat-loads : available
ldlat-stores : available
Am I using it wrong or we can't use the perf record events with perf mem?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The DSRC field is a 64-bit value that represents and the precise load and store information. The PEBS record includes this information but in a different format (see PEBS-Load Latency and PEBS-Precise Store in Chapter 18 of the Intel manual Volume 3). From the Linux source code, the DSRC field is declared as follows:
union perf_mem_data_src { __u64 val; struct { __u64 mem_op:5, /* type of opcode */ mem_lvl:14, /* memory hierarchy level */ mem_snoop:5, /* snoop mode */ mem_lock:2, /* lock instr */ mem_dtlb:7, /* tlb access */ mem_lvl_num:4, /* memory hierarchy level number */ mem_remote:1, /* remote */ mem_snoopx:2, /* snoop mode, ext */ mem_rsvd:24; }; };
The meaning of each of these bit fields is defined later in the same source code file. For example, consider a DSRC value of 0x68100242 (or in binary 0110 1000 0001 0000 0000 0010 0100 0010) from John's trace. The values of the bit fields are:
mem_op = 0x2 = PERF_MEM_OP_LOAD (memory load operation)
mem_lvl = 0x12 = PERF_MEM_LVL_HIT | PERF_MEM_LVL_LFB (LFB hit)
mem_snoop =0x02 = PERF_MEM_SNOOP_NONE (no snoop)
mem_lock = 0x0 (unlocked)
mem_dtlb = 0x1A = PERF_MEM_TLB_HIT | PERF_MEM_TLB_L1 | PERF_MEM_TLB_L2 (DTLB L1 or L2 hit)
mem_lvl_num = 0x0 (any cache)
mem_remote = 0x0 (local)
mem_snoopx = 0x0 (no forwarding)
This is consistent with the "perf report" output.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page