- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Can someone explain the purpose of having two separate cache leaves (leaves 2 and 4) for the cpuid instruction? I ask because on my Intel Xeon 5650 system, the data from leaf 2 does not include any info for the L1 data cache. Is it standard to put this in the info from leaf 4? Please advise.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your responses. Allow me to clarify my question:
@Sergey Kostrov: Unfortunately, I need to be able to read such information (cache sizes etc.) at install time for my application so that the code can tune itself appropriately (I'm making something kind of like the ATLAS linear algebra project...), so I need to understand how to read cache information
@iliyapolak: I have read the relevant manual sections, and here is where I am confused:
1. How is the cache data divided up between the two leaves? All of the caches I see when I use CPUID(eax=2) appear to be TLBs according to the table in the manual. Additionally, one of the cache descriptor bytes is 0xff, which according to the manual indicates:
"CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters"
But what does this mean? Does this mean leaf 2 should not report ANY cache parameters (and therefore are all of the codes are junk?), or do I only need to call CPUID(eax=4) for a certain defined subset of caches, and keep the values from CPUID(eax=2) for the others? Please advise.
For reference, these are all of the cache codes I get reading from eax,ebx,ecx,edx:
Code: 01
Code: 5a
Code: 03
Code: 55
Code: ff
Code: b2
Code: f0
Code: 00
Code: 00
Code: 00
Code: 00
Code: 00
Code: 00
Code: 00
Code: ca
Code: 00
3. Why leaf 4 called "Deterministic"? What is the significance of this word?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Samuel
Tomorrow I will read the relevant to your question section and I will try to give you an answer.
Regarding the meaning "deterministic" I suppose that cache parameters are looked up directly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>For reference, these are all of the cache codes I get reading from eax,ebx,ecx,edx:
Code: 01 Code: 5a Code: 03 Code: 55 Code: ff Code: b2 Code: f0 Code: 00 Code: 00 Code: 00 Code: 00 Code: 00 Code: 00 Code: 00 Code: ca Code: 00>>>
I assume that you have called cpuid with eax==2 and the order of the "code" values coresponds to actual information returned in registers eax,ebx,ecx,edx.Null values can be eliminated because they do not contain any encoded information.
Starting from eax LSB to MSB order:
eax == 0x01 - this indicates that cpuid must be executed once with an input value 2 in order to obtain full info about the cache and TLB
eax == 0x5a - Data TLB0: 2-MByte or 4 MByte pages, 4-way set associative, 32 entries
eax == 0x03 - Data TLB: 4 KByte pages, 4-way set associative, 64 entries
eax == 0x55 - Instruction TLB: 2-MByte or 4-MByte pages, fully associative, 7 entries
Starting from ebx LSB to MSB order
ebx == 0xff - CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters
ebx == 0xb2 - Instruction TLB: 4KByte pages, 4-way set associative, 64 entries
ebx == 0xf0 - 64-Byte prefetching
Start on of NULL byte values in ebx and ecx and edx registers skipping to edx == 0xca
edx == 0xca - Shared 2nd-Level TLB: 4 KByte pages, 4-way associative, 512 entries
As you pointed out the return byte in ebx == 0xff has some kind of cryptic meaning.Please refer to the table 3-17 on page 3-149 and try to execute cpuid with eax == 4 and ecx == 0 and post the results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is the raw data that I get from cpuid(EAX=4,ECX=0,1,2,3...) in bytes. Registers are in order eax,ebx,ecx,edx, separated by a blank line, and the registers returned for different ECX values are separated by '---------------------'
I give my interpretation of some of the values. They show an L1 Data cache, an L1 Instruction cache, and an L2 Data Cache. I assume that the last cache is the L3 data cache. All of the values that I analyzed match the output of the MacCPUID program, so this must be where it gets its information.
If anyone out there knows, it would be nice to know if the proper procedure is
1. Read all caches from leaf 2
2. If the 0xff byte is present, read all caches from leaf 4
3. Combine the lists to have all of the caches.
Right now, I don't know the answer to the key questions:
1. When 0xff is set, are the leaf 2 caches valid?
2. Is leaf 4 guaranteed not to duplicate leaf 2?
----------------------------------------------------------------------------
BEGIN REGISTER DATA
21 - 00100001 - 001 = Level 1, 00001 = Data cache
41 - 01000001 - 01=2 threads, 0000= Reserved,0=Not fully associative,1 = Self-initializing cache
00
3c
3f - 00111111=64 B system coherency line size
00 - 00000000
c0 - 11000000, 0000000000=1 Physical line partitions
01 - 00000001, 0000000111=8 way associative
3f
00
00
00
00
00
00
00
---------------------
22 - 00100010 - 001 = Level 1,00010 = Instruction Cache,
41 - 01000001 - 01=2 threads, 0000= Reserved,0=Not fully associative,1 = Self-initializing cache
00
3c
3f - 00111111=64 B system coherency line size
00 - 00000000
c0 - 11000000, 0000000000=1 Physical line partitions
00 - 00000000, 0000000011=4 way associative
7f
00
00
00
00
00
00
00
---------------------
43 - 01000011, 010=Level 2, 0011 = Unified cache
41 - 01000001 - 01=2 threads, 0000= Reserved,0=Not fully associative,1 = Self-initializing cache
00
3c
3f - 00111111=64 B system coherency line size
00 - 00000000
c0 - 11000000, 0000000000=1 Physical line partitions
01 - 00000001, 0000000111=8 way associative
ff
01
00
00
00
00
00
00
---------------------
63
c1
07
3c
3f
00
c0
03
ff
2f
00
00
02
00
00
00
---------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>I give my interpretation of some of the values. They show an L1 Data cache, an L1 Instruction cache, and an L2 Data Cache. I assume that the last cache is the L3 data cache. All of the values that I analyzed match the output of the MacCPUID program, so this must be where it gets its information>>>
Thanks for posting eax == 4 data.I suppose that procedure used by you is the right.Have you decode all values?How many indices have you used?
My description of eax,ebc,ecx,edx values which have not been decoded by you.
eax = [31:26] == 0x3c - 11110000 - Maximum number of addressable IDs for processor cores in the physical.
ecx = [31:0] == 0x3f - 111111 - Number of Sets ??? // manual says to add one to the ret value to get the result.Can you execute ecx = ecx+1 ecx holds the result(0x3f)
Null values skipped.
>>>1. When 0xff is set, are the leaf 2 caches valid?>>>
Can you execute eax ==2 next eax == 4 and eax == 2 and compare the results of cpuid.eax==2 calls?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regarding the eax=4 data, I decoded them all; I just didn't want to type them all out in the above format; they all make sense and match MacCPUID. I think that your interpretation of the bytes I didn't do is correct.
>>> Can you execute eax ==2 next eax == 4 and eax == 2 and compare the results of cpuid.eax==2 calls?
I'm sorry, but I am confused by what you mean here. Do you want me compare the output of eax=4 and eax=2? I think that we have all of the relevant data for that. But regardless, I don't think that just executing the call in one case on my one system can answer the question of whether they are guaranteed to be valid on all systems, even though they appear to be on mine, as well as whether eax=4 always returns information that is non-redundant with eax=2. I think we just have to wait for someone who knows the instruction implementation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>I'm sorry, but I am confused by what you mean here. Do you want me compare the output of eax=4 and eax=2?>>>
No.I simply wanted to compare the results of two calls when cpuid.eax == 2 because you expressed concern about the validity of the data when leaf 4 is executed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh, I am sorry, that is not what I meant. I do not believe that leaf 4 would corrupt the leaf 2 data; I was simply wondering about the correct interpretation of the 0xff byte returned by the leaf 2 call. It says "CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters", and I did not know if that meant that when you see 0xff, that means that you should not call leaf 2 for this processor. Since leaf 4 returns only four caches, it seems that 0xff means that you are supposed to call both and aggregate the results. It would be good to have that clarified though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is ok.I think that your explanation of 0xff byte make sense.Unfortunatly Intel official documentation does not explain that in great details.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW, my understanding is that leaf 2 was used in early CPU models and then was deprecated in favor of leaf 4, which reports information in a more flexible and complete form. All SB and IB CPUs I tried reported cache information in leaf 4 rather than 2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@andysem
Sorry for off topic,but what FWIW stands for?
Regarding your leaf 4 explanation it does make sense.I suppose that leaf 2 info was left maybe for compatibility reason.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not think it is exclusively a compatibility reason. Leaf 4 does not return all of the information present in leaf 2; for example, no TLBs are given by leaf 4. So it seems always necessary to call both for full information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @andysem

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page