Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

How to obtain PEBS data linear address and latency value from the perf report?

kundnani__harsh
Beginner
1,265 Views

I am trying to use perf to obtain the value of PEBS data linear address and latency value using perf for the MEM_LOAD_UOPS_RETIRED.L3_MISS event (which I think is the LLC miss event). I have used the following command:

perf record -d -e cpu/event=0xd1,umask=0x20/ppu -F 1000 --intr-regs=SI,DI,BP,SP,R8 [application]

This generates the perf.data file. How can I read this file to obtain the data linear address and load latency value? I have tried using this command to dump the generated report in raw format and extract RECORD_SAMPLE from it:

perf report -D -i perf.data | grep RECORD_SAMPLE > sample.txt

I don't completely understand the columns it generates as there are no headers. I have attached few lines from the sample.txt file:

7248128953093548 0x16fa0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 15 addr: 0
7248128962085643 0x16ff8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 16 addr: 0
7248128972080714 0x17050 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707db4 period: 16 addr: 0
7248128980080286 0x170a8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 16 addr: 0
7248128989072422 0x17100 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d6c8 period: 16 addr: 0
7248128998067923 0x17158 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b73a584 period: 16 addr: 0
7248129006067467 0x171b0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 16 addr: 0
7248129016059056 0x17208 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d574 period: 15 addr: 0
7248129024055215 0x17260 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707d30 period: 15 addr: 0
7248129033051029 0x172b8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c4f period: 15 addr: 0
7248129041156459 0x17310 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 15 addr: 0
7248129050042391 0x17368 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5af period: 16 addr: 0
7248129058038570 0x173c0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d582 period: 16 addr: 0
7248129070032626 0x17418 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5a4 period: 16 addr: 0
7248129079028316 0x17470 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707ebf period: 15 addr: 0
7248129088023771 0x174c8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707e89 period: 15 addr: 0
7248129098019481 0x17520 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129105015588 0x17578 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d6d6 period: 15 addr: 0
7248129118009214 0x175d0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b739482 period: 15 addr: 0
7248129126005649 0x17628 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129133001958 0x17680 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129143996535 0x176d8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129151996795 0x17730 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 15 addr: 0
7248129163986760 0x17788 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129173166957 0x177e0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0x7fd38b7455ed period: 15 addr: 0
7248129180978406 0x17838 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d589 period: 16 addr: 0
7248129188974446 0x17890 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c2f period: 16 addr: 0
7248129199968956 0x178e8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c93 period: 16 addr: 0
7248129208964678 0x17940 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c23 period: 16 addr: 0
7248129217960457 0x17998 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b75c9dc period: 16 addr: 0
7248129227955422 0x179f0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5b2 period: 16 addr: 0
7248129236950918 0x17a48 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5b2 period: 16 addr: 0
7248129247945633 0x17aa0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 16 addr: 0

Also, the addr value is 0 which should not be the case.

My processor is E5-2650 v3. Please let me know if any additional information is needed.

I had made a mistake while running the command because of which I was getting addr as 0 and I am adding the updated output of sample.txt file:

7271806242314836 0x31ba8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e7 period: 12136 addr: 0x7ffc54f2e7d0
7271806243314179 0x31c10 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e3 period: 12158 addr: 0x7ffc54f2e7c0
7271806244321965 0x31c78 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d6c6 period: 12152 addr: 0x7ffc54f2e808
7271806245308373 0x31ce0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d572 period: 12162 addr: 0x7ffc54f2e80c
7271806246292832 0x31d48 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12202 addr: 0x7ffc54f2e805
7271806247289941 0x31db0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e7 period: 12233 addr: 0x7ffc54f2e7d0
7271806248306006 0x31e18 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12148467 period: 12210 addr: 0x7ffc54f2e7d8
7271806249308119 0x31e80 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116d6c period: 12186 addr: 0x7fcd1249a0c8
7271806250308990 0x31ee8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c08 period: 12190 addr: 0x7fcd1249bcd4
7271806251307427 0x31f50 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd1216b9c5 period: 12191 addr: 0x7fcd137519b4
7271806252297279 0x31fb8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c2f period: 12202 addr: 0x7ffc54f2e7f1
7271806253299296 0x32020 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d7c8 period: 12214 addr: 0x7ffc54f2f110
7271806254304888 0x32088 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121494fc period: 12205 addr: 0x229c4f8
7271806255306048 0x320f0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd1216b99e period: 12200 addr: 0x7fcd13751941
7271806256289251 0x32158 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12217 addr: 0x7ffc54f2e806
7271806257293080 0x321c0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116ce0 period: 12234 addr: 0x7fcd12498dc0
7271806258301450 0x32228 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116d77 period: 12216 addr: 0x7fcd12246b23
7271806259297345 0x32290 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c93 period: 12217 addr: 0x7ffc54f2e80d
7271806260296250 0x322f8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e5 period: 12229 addr: 0x7ffc54f2e7c8
7271806261297185 0x32360 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d572 period: 12225 addr: 0x7ffc54f2e808
7271806262316489 0x323c8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165ca9 period: 12211 addr: 0x7ffc54f2e70c
7271806263329132 0x32430 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c8e period: 12158 addr: 0x7ffc54f2e709
7271806264337550 0x32498 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165ca9 period: 12131 addr: 0x7ffc54f2e711
7271806265349338 0x32500 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d572 period: 12110 addr: 0x7ffc54f2e808
7271806266359123 0x32568 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d7c8 period: 12076 addr: 0x7ffc54f2f110
7271806267338135 0x325d0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd1214838b period: 12090 addr: 0x229c4f0
7271806268335486 0x32638 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c8c period: 12124 addr: 0x7ffc54f2e804
7271806269327045 0x326a0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c9b period: 12145 addr: 0x7ffc54f2e809
7271806270315896 0x32708 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12162 addr: 0x7ffc54f2e807
7271806271319673 0x32770 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165cd5 period: 12176 addr: 0x7ffc54f2e7d8
7271806272325312 0x327d8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12170 addr: 0x7ffc54f2e801

I still don't know how to get the data linear address and load latency from the perf.data file.

0 Kudos
4 Replies
McCalpinJohn
Honored Contributor III
1,265 Views

I used the "perf mem" command to try to get the same information.  This uses the PEBS load latency facility to get the information about where the data was located -- as described in the presentation at http://www.paradyn.org/petascale2013/slides/eranian13.pdf

I ran the data collection with something like:

perf mem -t load -C 0-23 -U record -U --ldlat 30 -- ./CacheableReadTest_1GB

The "perf report" can be "raw", which mostly makes sense.  In the output below, everything makes sense except for the DSRC field (for which I can find no documentation):

# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL
[...]
256545 256548 0x404ca7 0x2aaac0120fb0 34 0x68100242 CacheableReadTest_1GB:main
256545 256565 0x404c8b 0x2aaac06ba050 211 0x68100242 /CacheableReadTest_1GB:main
256545 256553 0x405071 0x2aaac21a8f40 2018 0x68201042 CacheableReadTest_1GB:ssum
256545 256565 0x40506a 0x2aaac25f7800 1900 0x68201042 CacheableReadTest_1GB:ssum
[...]

The formatted output from "perf report" also mostly makes sense

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 44K of event 'cpu/mem-loads,ldlat=30/P'
# Total weight : 2542544
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead       Samples  Local Weight  Memory access             Symbol                                                                                    Shared Object          Data Symbol             Data Object            Snoop         TLB access              Locked
# ........  ............  ............  ........................  ........................................................................................  .....................  ......................  .....................  ............  ......................  ......
#
[...]
     0.00%             1  34            LFB hit                   [.] main                                                                                  CacheableReadTest_1GB  [.] 0x00002aaac0120fb0  perf-256545.map        None          L1 or L2 hit            No
     0.01%             1  211           LFB hit                   [.] main                                                                                  CacheableReadTest_1GB  [.] 0x00002aaac06ba050  perf-256545.map        None          L1 or L2 hit            No
     0.08%             1  2018          Local RAM hit             [.] ssum                                                                                  CacheableReadTest_1GB  [.] 0x00002aaac21a8f40  perf-256545.map        Miss          L1 or L2 hit            No
     0.07%             1  1900          Local RAM hit             [.] ssum                                                                                  CacheableReadTest_1GB  [.] 0x00002aaac25f7800  perf-256545.map        Miss          L1 or L2 hit            No
[...]

I can't figure out how to bridge the formatted report results for "Memory access", "Snoop", "TLB Access", "Locked", with the DSRC encodings from the raw output.
I also don't understand why the output is in completely different order for the two report formats -- but at least the "Local Weight" (latency in processor core cycles) is the same for each address in the two versions of the report....

0 Kudos
kundnani__harsh
Beginner
1,265 Views

Thank you John. I think local_weight is the load access latency and addr is the data linear address. I have use of only these two fields. I searched for the DSRC and I found this link giving some explanation regarding DSRC. I didn't understand it completely, may be you will have better understanding.

https://stackoverflow.com/questions/33295288/perf-mem-d-report/34689469#34689469 ;

0 Kudos
kundnani__harsh
Beginner
1,265 Views

I have one more question.

When I was using perf record I could use multiple events like in my example I was specifically using it for MEM_LOAD_UOPS_RETIRED.L3_MISS event. I could also specifiy other events like MEM_LOAD_UOPS_RETIRED.L2_MISS or MEM_LOAD_UOPS_RETIRED.L1_MISS. I can't use these events with perf mem record.

The man page of perf mem says I need to search for events using this command:

perf mem record -e list

This gives me only two events:

ldlat-loads  : available
ldlat-stores : available 

Am I using it wrong or we can't use the perf record events with perf mem?

0 Kudos
HadiBrais
New Contributor III
1,265 Views

The DSRC field is a 64-bit value that represents and the precise load and store information. The PEBS record includes this information but in a different format (see PEBS-Load Latency and PEBS-Precise Store in Chapter 18 of the Intel manual Volume 3). From the Linux source code, the DSRC field is declared as follows:

union perf_mem_data_src {
	__u64 val;
	struct {
		__u64   mem_op:5,	/* type of opcode */
			mem_lvl:14,	    /* memory hierarchy level */
			mem_snoop:5,	/* snoop mode */
			mem_lock:2,	    /* lock instr */
			mem_dtlb:7,	    /* tlb access */
			mem_lvl_num:4,	/* memory hierarchy level number */
			mem_remote:1,   /* remote */
			mem_snoopx:2,	/* snoop mode, ext */
			mem_rsvd:24;
	};
};

The meaning of each of these bit fields is defined later in the same source code file. For example, consider a DSRC value of 0x68100242 (or in binary 0110 1000 0001 0000 0000 0010 0100 0010) from John's trace. The values of the bit fields are:

mem_op = 0x2 = PERF_MEM_OP_LOAD (memory load operation)

mem_lvl = 0x12 = PERF_MEM_LVL_HIT | PERF_MEM_LVL_LFB (LFB hit)

mem_snoop =0x02 = PERF_MEM_SNOOP_NONE (no snoop)

mem_lock = 0x0 (unlocked)

mem_dtlb = 0x1A = PERF_MEM_TLB_HIT | PERF_MEM_TLB_L1 | PERF_MEM_TLB_L2 (DTLB L1 or L2 hit)

mem_lvl_num = 0x0 (any cache)

mem_remote = 0x0 (local)

mem_snoopx = 0x0 (no forwarding)

This is consistent with the "perf report" output.

0 Kudos
Reply