Community
cancel
Showing results for 
Search instead for 
Did you mean: 
kundnani__harsh
Beginner
170 Views

How to obtain PEBS data linear address and latency value from the perf report?

I am trying to use perf to obtain the value of PEBS data linear address and latency value using perf for the MEM_LOAD_UOPS_RETIRED.L3_MISS event (which I think is the LLC miss event). I have used the following command:

perf record -d -e cpu/event=0xd1,umask=0x20/ppu -F 1000 --intr-regs=SI,DI,BP,SP,R8 [application]

This generates the perf.data file. How can I read this file to obtain the data linear address and load latency value? I have tried using this command to dump the generated report in raw format and extract RECORD_SAMPLE from it:

perf report -D -i perf.data | grep RECORD_SAMPLE > sample.txt

I don't completely understand the columns it generates as there are no headers. I have attached few lines from the sample.txt file:

7248128953093548 0x16fa0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 15 addr: 0
7248128962085643 0x16ff8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 16 addr: 0
7248128972080714 0x17050 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707db4 period: 16 addr: 0
7248128980080286 0x170a8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 16 addr: 0
7248128989072422 0x17100 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d6c8 period: 16 addr: 0
7248128998067923 0x17158 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b73a584 period: 16 addr: 0
7248129006067467 0x171b0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 16 addr: 0
7248129016059056 0x17208 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d574 period: 15 addr: 0
7248129024055215 0x17260 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707d30 period: 15 addr: 0
7248129033051029 0x172b8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c4f period: 15 addr: 0
7248129041156459 0x17310 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 15 addr: 0
7248129050042391 0x17368 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5af period: 16 addr: 0
7248129058038570 0x173c0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d582 period: 16 addr: 0
7248129070032626 0x17418 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5a4 period: 16 addr: 0
7248129079028316 0x17470 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707ebf period: 15 addr: 0
7248129088023771 0x174c8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b707e89 period: 15 addr: 0
7248129098019481 0x17520 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129105015588 0x17578 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d6d6 period: 15 addr: 0
7248129118009214 0x175d0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b739482 period: 15 addr: 0
7248129126005649 0x17628 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129133001958 0x17680 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129143996535 0x176d8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129151996795 0x17730 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0xffffffff8163d9ac period: 15 addr: 0
7248129163986760 0x17788 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 15 addr: 0
7248129173166957 0x177e0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4002): 129174/129174: 0x7fd38b7455ed period: 15 addr: 0
7248129180978406 0x17838 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d589 period: 16 addr: 0
7248129188974446 0x17890 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c2f period: 16 addr: 0
7248129199968956 0x178e8 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c93 period: 16 addr: 0
7248129208964678 0x17940 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c23 period: 16 addr: 0
7248129217960457 0x17998 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b75c9dc period: 16 addr: 0
7248129227955422 0x179f0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5b2 period: 16 addr: 0
7248129236950918 0x17a48 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x42d5b2 period: 16 addr: 0
7248129247945633 0x17aa0 [0x58]: PERF_RECORD_SAMPLE(IP, 0x4001): 129174/129174: 0x7fd38b756c05 period: 16 addr: 0

Also, the addr value is 0 which should not be the case.

My processor is E5-2650 v3. Please let me know if any additional information is needed.

I had made a mistake while running the command because of which I was getting addr as 0 and I am adding the updated output of sample.txt file:

7271806242314836 0x31ba8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e7 period: 12136 addr: 0x7ffc54f2e7d0
7271806243314179 0x31c10 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e3 period: 12158 addr: 0x7ffc54f2e7c0
7271806244321965 0x31c78 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d6c6 period: 12152 addr: 0x7ffc54f2e808
7271806245308373 0x31ce0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d572 period: 12162 addr: 0x7ffc54f2e80c
7271806246292832 0x31d48 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12202 addr: 0x7ffc54f2e805
7271806247289941 0x31db0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e7 period: 12233 addr: 0x7ffc54f2e7d0
7271806248306006 0x31e18 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12148467 period: 12210 addr: 0x7ffc54f2e7d8
7271806249308119 0x31e80 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116d6c period: 12186 addr: 0x7fcd1249a0c8
7271806250308990 0x31ee8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c08 period: 12190 addr: 0x7fcd1249bcd4
7271806251307427 0x31f50 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd1216b9c5 period: 12191 addr: 0x7fcd137519b4
7271806252297279 0x31fb8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c2f period: 12202 addr: 0x7ffc54f2e7f1
7271806253299296 0x32020 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d7c8 period: 12214 addr: 0x7ffc54f2f110
7271806254304888 0x32088 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121494fc period: 12205 addr: 0x229c4f8
7271806255306048 0x320f0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd1216b99e period: 12200 addr: 0x7fcd13751941
7271806256289251 0x32158 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12217 addr: 0x7ffc54f2e806
7271806257293080 0x321c0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116ce0 period: 12234 addr: 0x7fcd12498dc0
7271806258301450 0x32228 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116d77 period: 12216 addr: 0x7fcd12246b23
7271806259297345 0x32290 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c93 period: 12217 addr: 0x7ffc54f2e80d
7271806260296250 0x322f8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd121170e5 period: 12229 addr: 0x7ffc54f2e7c8
7271806261297185 0x32360 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d572 period: 12225 addr: 0x7ffc54f2e808
7271806262316489 0x323c8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165ca9 period: 12211 addr: 0x7ffc54f2e70c
7271806263329132 0x32430 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c8e period: 12158 addr: 0x7ffc54f2e709
7271806264337550 0x32498 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165ca9 period: 12131 addr: 0x7ffc54f2e711
7271806265349338 0x32500 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d572 period: 12110 addr: 0x7ffc54f2e808
7271806266359123 0x32568 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x42d7c8 period: 12076 addr: 0x7ffc54f2f110
7271806267338135 0x325d0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd1214838b period: 12090 addr: 0x229c4f0
7271806268335486 0x32638 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c8c period: 12124 addr: 0x7ffc54f2e804
7271806269327045 0x326a0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165c9b period: 12145 addr: 0x7ffc54f2e809
7271806270315896 0x32708 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12162 addr: 0x7ffc54f2e807
7271806271319673 0x32770 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12165cd5 period: 12176 addr: 0x7ffc54f2e7d8
7271806272325312 0x327d8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x4002): 130944/130944: 0x7fcd12116eb4 period: 12170 addr: 0x7ffc54f2e801

I still don't know how to get the data linear address and load latency from the perf.data file.

0 Kudos
4 Replies
McCalpinJohn
Black Belt
170 Views

I used the "perf mem" command to try to get the same information.  This uses the PEBS load latency facility to get the information about where the data was located -- as described in the presentation at http://www.paradyn.org/petascale2013/slides/eranian13.pdf

I ran the data collection with something like:

perf mem -t load -C 0-23 -U record -U --ldlat 30 -- ./CacheableReadTest_1GB

The "perf report" can be "raw", which mostly makes sense.  In the output below, everything makes sense except for the DSRC field (for which I can find no documentation):

# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL
[...]
256545 256548 0x404ca7 0x2aaac0120fb0 34 0x68100242 CacheableReadTest_1GB:main
256545 256565 0x404c8b 0x2aaac06ba050 211 0x68100242 /CacheableReadTest_1GB:main
256545 256553 0x405071 0x2aaac21a8f40 2018 0x68201042 CacheableReadTest_1GB:ssum
256545 256565 0x40506a 0x2aaac25f7800 1900 0x68201042 CacheableReadTest_1GB:ssum
[...]

The formatted output from "perf report" also mostly makes sense

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 44K of event 'cpu/mem-loads,ldlat=30/P'
# Total weight : 2542544
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead       Samples  Local Weight  Memory access             Symbol                                                                                    Shared Object          Data Symbol             Data Object            Snoop         TLB access              Locked
# ........  ............  ............  ........................  ........................................................................................  .....................  ......................  .....................  ............  ......................  ......
#
[...]
     0.00%             1  34            LFB hit                   [.] main                                                                                  CacheableReadTest_1GB  [.] 0x00002aaac0120fb0  perf-256545.map        None          L1 or L2 hit            No
     0.01%             1  211           LFB hit                   [.] main                                                                                  CacheableReadTest_1GB  [.] 0x00002aaac06ba050  perf-256545.map        None          L1 or L2 hit            No
     0.08%             1  2018          Local RAM hit             [.] ssum                                                                                  CacheableReadTest_1GB  [.] 0x00002aaac21a8f40  perf-256545.map        Miss          L1 or L2 hit            No
     0.07%             1  1900          Local RAM hit             [.] ssum                                                                                  CacheableReadTest_1GB  [.] 0x00002aaac25f7800  perf-256545.map        Miss          L1 or L2 hit            No
[...]

I can't figure out how to bridge the formatted report results for "Memory access", "Snoop", "TLB Access", "Locked", with the DSRC encodings from the raw output.
I also don't understand why the output is in completely different order for the two report formats -- but at least the "Local Weight" (latency in processor core cycles) is the same for each address in the two versions of the report....

kundnani__harsh
Beginner
170 Views

Thank you John. I think local_weight is the load access latency and addr is the data linear address. I have use of only these two fields. I searched for the DSRC and I found this link giving some explanation regarding DSRC. I didn't understand it completely, may be you will have better understanding.

https://stackoverflow.com/questions/33295288/perf-mem-d-report/34689469#34689469 ;

kundnani__harsh
Beginner
170 Views

I have one more question.

When I was using perf record I could use multiple events like in my example I was specifically using it for MEM_LOAD_UOPS_RETIRED.L3_MISS event. I could also specifiy other events like MEM_LOAD_UOPS_RETIRED.L2_MISS or MEM_LOAD_UOPS_RETIRED.L1_MISS. I can't use these events with perf mem record.

The man page of perf mem says I need to search for events using this command:

perf mem record -e list

This gives me only two events:

ldlat-loads  : available
ldlat-stores : available 

Am I using it wrong or we can't use the perf record events with perf mem?

HadiBrais
New Contributor III
170 Views

The DSRC field is a 64-bit value that represents and the precise load and store information. The PEBS record includes this information but in a different format (see PEBS-Load Latency and PEBS-Precise Store in Chapter 18 of the Intel manual Volume 3). From the Linux source code, the DSRC field is declared as follows:

union perf_mem_data_src {
	__u64 val;
	struct {
		__u64   mem_op:5,	/* type of opcode */
			mem_lvl:14,	    /* memory hierarchy level */
			mem_snoop:5,	/* snoop mode */
			mem_lock:2,	    /* lock instr */
			mem_dtlb:7,	    /* tlb access */
			mem_lvl_num:4,	/* memory hierarchy level number */
			mem_remote:1,   /* remote */
			mem_snoopx:2,	/* snoop mode, ext */
			mem_rsvd:24;
	};
};

The meaning of each of these bit fields is defined later in the same source code file. For example, consider a DSRC value of 0x68100242 (or in binary 0110 1000 0001 0000 0000 0010 0100 0010) from John's trace. The values of the bit fields are:

mem_op = 0x2 = PERF_MEM_OP_LOAD (memory load operation)

mem_lvl = 0x12 = PERF_MEM_LVL_HIT | PERF_MEM_LVL_LFB (LFB hit)

mem_snoop =0x02 = PERF_MEM_SNOOP_NONE (no snoop)

mem_lock = 0x0 (unlocked)

mem_dtlb = 0x1A = PERF_MEM_TLB_HIT | PERF_MEM_TLB_L1 | PERF_MEM_TLB_L2 (DTLB L1 or L2 hit)

mem_lvl_num = 0x0 (any cache)

mem_remote = 0x0 (local)

mem_snoopx = 0x0 (no forwarding)

This is consistent with the "perf report" output.

Reply