- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone!
I'd like to find out which cores share a particular cache. With the 'cpuid' command I found lots of useful information, but I still need some sort of unique cache identifier to really determine which cores use which cache.
Does anyone know how to get this information? Or is there another way to get the information?
Thanks in advance!
Robert
I'd like to find out which cores share a particular cache. With the 'cpuid' command I found lots of useful information, but I still need some sort of unique cache identifier to really determine which cores use which cache.
Does anyone know how to get this information? Or is there another way to get the information?
Thanks in advance!
Robert
Link Copied
15 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Robert,
did you trythe cache topology enumeration algorithm/utility described in the article about "Intel 64 Architecture Processor Topology Enumeration".
Roman
did you trythe cache topology enumeration algorithm/utility described in the article about "Intel 64 Architecture Processor Topology Enumeration".
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Roman
Yes I quickly looked at it, but I stumbled over these 'affinity masks'. As far as I understood it, this information is from the operating system, right? (or can this information be obtained from the hardware?)
The problem with this is, that I cannot rely on an operating system to do the work, since the code is for an operating system :)
Yes I quickly looked at it, but I stumbled over these 'affinity masks'. As far as I understood it, this information is from the operating system, right? (or can this information be obtained from the hardware?)
The problem with this is, that I cannot rely on an operating system to do the work, since the code is for an operating system :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The affinity masks are used to just get the cpuid information from each cpu.
We have to read the cpuid info from all the cpus.
What are you trying to accomplish?
An OS independent way of figuring out cache sharing?
You can't really get a method that doesn't use some aspect ofan OS.
Or do you wanta way that works on multiple OS's?
Pat
We have to read the cpuid info from all the cpus.
What are you trying to accomplish?
An OS independent way of figuring out cache sharing?
You can't really get a method that doesn't use some aspect ofan OS.
Or do you wanta way that works on multiple OS's?
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it should be an OS independent way of figuring out cache sharing. Because this code is the part of the operating system that gathers this information...
So there is no way to get this information from the hardware directly? ... Then I guess I have to come up with another technique to determine which caches belong to which core ...
So there is no way to get this information from the hardware directly? ... Then I guess I have to come up with another technique to determine which caches belong to which core ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would say it like this:
I don't see how you can get the cpuid information fromall thecpus without using some facility of the OS to switch your software fromone cpu to another cpu.
Certainly you can get the cpuid info from the cpu you are currently running on without the OS but you need the cpuid info from ALL the cpus.
Most of the enumeration library is windows & linux OS independent and the OS specific code is in util_os.c.
The library code is not "part of the operating system" but util_os.c does call OS routines to move the thread from 1 cpu to the next.
Hope this helps,
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, on Windows, system routines like GetLogicalProcessorInformationEx() will detail which cpus share a cache. See http://msdn.microsoft.com/en-us/library/windows/desktop/dd405488%28v=vs.85%29.aspx
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes the switching is necessary and this is already done!
Edit:
Thanks, then I'll have a closer look at the enumeration algorithm since it can be executed independently!
Edit:
Thanks, then I'll have a closer look at the enumeration algorithm since it can be executed independently!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again
I looked at the topology enumeration algorithm provided by Intel. I think I understood the basic concept, but there are still some things that work incorrectly.
For example I wrote some lines to gather the information about a cache at a specific level (see below).
The log_roundToNearestPof2 performs the same operation as described in the documentation (and cpuid just calls CPUID and stores the values of all registers in the parameters).
This piece of code is then executed on all levels (subLevelIndex) and on all processors.
[cpp]uint32_t eax, ebx, ecx, edx; eax = 1; ecx = 0; cpuid(&eax, &ebx, &ecx, &edx); const uint8_t initialAPICID = 0xff & (ebx >> 24); eax = 4; ecx = subLevelIndex; cpuid(&eax, &ebx, &ecx, &edx); const uint8_t levelType = 0xf & eax; const char* levelName[] = {"Invalid", "Data Cache ", "Instruction Cache", "Unified Cache"}; const uint16_t cacheMaskWidth = log_roundToNearestPof2(((eax >> 14) & 0xfff) + 1); const uint32_t mask = ~((-1) << cacheMaskWidth); const uint8_t threadsSharingCache = ((eax >> 14) & 0xfff) + 1; const uint32_t cacheID = mask & initialAPICID; printf("Level: %d (%s),t %d threads/cache, tCache ID = %dn", levelType, levelName[levelType], threadsSharingCache, cacheID);[/cpp]
Does anyone see the where the problem lies in this code?
Thanks in advance!
Robert
I looked at the topology enumeration algorithm provided by Intel. I think I understood the basic concept, but there are still some things that work incorrectly.
For example I wrote some lines to gather the information about a cache at a specific level (see below).
The log_roundToNearestPof2 performs the same operation as described in the documentation (and cpuid just calls CPUID and stores the values of all registers in the parameters).
This piece of code is then executed on all levels (subLevelIndex) and on all processors.
[cpp]uint32_t eax, ebx, ecx, edx; eax = 1; ecx = 0; cpuid(&eax, &ebx, &ecx, &edx); const uint8_t initialAPICID = 0xff & (ebx >> 24); eax = 4; ecx = subLevelIndex; cpuid(&eax, &ebx, &ecx, &edx); const uint8_t levelType = 0xf & eax; const char* levelName[] = {"Invalid", "Data Cache ", "Instruction Cache", "Unified Cache"}; const uint16_t cacheMaskWidth = log_roundToNearestPof2(((eax >> 14) & 0xfff) + 1); const uint32_t mask = ~((-1) << cacheMaskWidth); const uint8_t threadsSharingCache = ((eax >> 14) & 0xfff) + 1; const uint32_t cacheID = mask & initialAPICID; printf("Level: %d (%s),t %d threads/cache, tCache ID = %dn", levelType, levelName[levelType], threadsSharingCache, cacheID);[/cpp]
Does anyone see the where the problem lies in this code?
Thanks in advance!
Robert
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Robert,
Can you give us a clue?
Perhaps include the output?
Thanks,
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry forgot all about that.
I executed the code on all 8 cores (with taskset) on my linux system. On Each core the code was executed for the subLevelIndex's 0 .. 3.
This is the output
[bash]Running on core 0 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 0 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 1 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 1 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 2 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 0 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 3 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 1 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 4 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 0 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 4 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 5 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 1 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 6 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 0 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 7 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 1 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 7 [/bash]
When I execute cpu_topology (the enumeration algorithm provided by intel), I get the following output:
[bash] Software visible enumeration in the system: Number of logical processors visible to the OS: 8 Number of logical processors visible to this process: 8 Number of processor cores visible to this process: 4 Number of physical packages visible to this process: 1 Hierarchical counts by levels of processor topology: # of cores in package 0 visible to this process: 4 . # of logical processors in Core 0 visible to this process: 2 . # of logical processors in Core 1 visible to this process: 2 . # of logical processors in Core 2 visible to this process: 2 . # of logical processors in Core 3 visible to this process: 2 . Affinity masks per SMT thread, per core, per package: Individual: P:0, C:0, T:0 --> 1 P:0, C:0, T:1 --> 2 Core-aggregated: P:0, C:0 --> 3 Individual: P:0, C:1, T:0 --> 4 P:0, C:1, T:1 --> 8 Core-aggregated: P:0, C:1 --> c Individual: P:0, C:2, T:0 --> 10 P:0, C:2, T:1 --> 20 Core-aggregated: P:0, C:2 --> 30 Individual: P:0, C:3, T:0 --> 40 P:0, C:3, T:1 --> 80 Core-aggregated: P:0, C:3 --> c0 Pkg-aggregated: P:0 --> ff APIC ID listings from affinity masks Affinity mask 00000001 - apic id 0 Affinity mask 00000002 - apic id 1 Affinity mask 00000004 - apic id 2 Affinity mask 00000008 - apic id 3 Affinity mask 00000010 - apic id 4 Affinity mask 00000020 - apic id 5 Affinity mask 00000040 - apic id 6 Affinity mask 00000080 - apic id 7 Package 0 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache CmbMsk will differ from AffMsk if > 1 hw_thread/cache Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') L1D is Level 1 Data cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 4 L1I is Level 1 Instruction cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 4 L2 is Level 2 Unified cache, size(KBytes)= 256, Cores/cache= 2, Caches/package= 4 L3 is Level 3 Unified cache, size(KBytes)= 6144, Cores/cache= 8, Caches/package= 1 +-----------+-----------+-----------+-----------+ Cache | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | OScpu#| 0 1| 2 3| 4 5| 6 7| Core |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1| AffMsk| 1 2| 4 8| 10 20| 40 80| CmbMsk| 3 | c | 30 | c0 | +-----------+-----------+-----------+-----------+ Cache | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | +-----------+-----------+-----------+-----------+ Cache | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | +-----------+-----------+-----------+-----------+ Cache | L3 | Size | 6M | CmbMsk| ff | +-----------------------------------------------+ [/bash]
I hope this helps!
I executed the code on all 8 cores (with taskset) on my linux system. On Each core the code was executed for the subLevelIndex's 0 .. 3.
This is the output
[bash]Running on core 0 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 0 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 1 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 1 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 2 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 0 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 3 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 1 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 4 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 0 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 4 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 5 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 1 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 6 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 0 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 0 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 7 Level: 1 (Data Cache ), 2 threads/cache, Cache ID = 1 Level: 2 (Instruction Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 2 threads/cache, Cache ID = 1 Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 7 [/bash]
When I execute cpu_topology (the enumeration algorithm provided by intel), I get the following output:
[bash] Software visible enumeration in the system: Number of logical processors visible to the OS: 8 Number of logical processors visible to this process: 8 Number of processor cores visible to this process: 4 Number of physical packages visible to this process: 1 Hierarchical counts by levels of processor topology: # of cores in package 0 visible to this process: 4 . # of logical processors in Core 0 visible to this process: 2 . # of logical processors in Core 1 visible to this process: 2 . # of logical processors in Core 2 visible to this process: 2 . # of logical processors in Core 3 visible to this process: 2 . Affinity masks per SMT thread, per core, per package: Individual: P:0, C:0, T:0 --> 1 P:0, C:0, T:1 --> 2 Core-aggregated: P:0, C:0 --> 3 Individual: P:0, C:1, T:0 --> 4 P:0, C:1, T:1 --> 8 Core-aggregated: P:0, C:1 --> c Individual: P:0, C:2, T:0 --> 10 P:0, C:2, T:1 --> 20 Core-aggregated: P:0, C:2 --> 30 Individual: P:0, C:3, T:0 --> 40 P:0, C:3, T:1 --> 80 Core-aggregated: P:0, C:3 --> c0 Pkg-aggregated: P:0 --> ff APIC ID listings from affinity masks Affinity mask 00000001 - apic id 0 Affinity mask 00000002 - apic id 1 Affinity mask 00000004 - apic id 2 Affinity mask 00000008 - apic id 3 Affinity mask 00000010 - apic id 4 Affinity mask 00000020 - apic id 5 Affinity mask 00000040 - apic id 6 Affinity mask 00000080 - apic id 7 Package 0 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache CmbMsk will differ from AffMsk if > 1 hw_thread/cache Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') L1D is Level 1 Data cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 4 L1I is Level 1 Instruction cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 4 L2 is Level 2 Unified cache, size(KBytes)= 256, Cores/cache= 2, Caches/package= 4 L3 is Level 3 Unified cache, size(KBytes)= 6144, Cores/cache= 8, Caches/package= 1 +-----------+-----------+-----------+-----------+ Cache | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | OScpu#| 0 1| 2 3| 4 5| 6 7| Core |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1| AffMsk| 1 2| 4 8| 10 20| 40 80| CmbMsk| 3 | c | 30 | c0 | +-----------+-----------+-----------+-----------+ Cache | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | +-----------+-----------+-----------+-----------+ Cache | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | +-----------+-----------+-----------+-----------+ Cache | L3 | Size | 6M | CmbMsk| ff | +-----------------------------------------------+ [/bash]
I hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Robchip,
Your output looks reasonable.
But, as to whetherthe code will work on allIntel chips, I would have to go through the cpu_topology code, extract out the relevant lines, and compare it to what you've done.
I don't have the time to go through the code like this right now.
If you've extracted out the relevant code from the library correctly then it should work.
Sorry to not be more helpful,
Pat
Your output looks reasonable.
But, as to whetherthe code will work on allIntel chips, I would have to go through the cpu_topology code, extract out the relevant lines, and compare it to what you've done.
I don't have the time to go through the code like this right now.
If you've extracted out the relevant code from the library correctly then it should work.
Sorry to not be more helpful,
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Pat
thanks for the answer!
You said, that the output looks reasonable - but then I have trouble understanding it:
how do I have to interpret the last line of each core:
[bash]Running on core 0 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 1 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 2 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 3 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 4 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 4 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 5 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 6 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 7 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 7 [/bash]
shouldn't this cache ID always be the same since there is only one L3 cache shared by all threads?
thanks for the answer!
You said, that the output looks reasonable - but then I have trouble understanding it:
how do I have to interpret the last line of each core:
[bash]Running on core 0 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 1 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 2 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 3 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 4 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 4 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 5 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 6 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Running on core 7 ... Level: 3 (Unified Cache), 16 threads/cache, Cache ID = 7 [/bash]
shouldn't this cache ID always be the same since there is only one L3 cache shared by all threads?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yourcache IDprobably should be the same.
Are you doing the same code method as in the cpu_topology library?
If not, why not just use the library?
Pat
Are you doing the same code method as in the cpu_topology library?
If not, why not just use the library?
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hmm, I looked at the code but I'm only querying the hardware not interpreting the information.
Yeah - using the library is a good idea, of course - but at the moment I'd just like to find the bug in the code :)
Thanks a lot for your help!
Yeah - using the library is a good idea, of course - but at the moment I'd just like to find the bug in the code :)
Thanks a lot for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey everyone
just for completeness I'd like to post the solution to the problem:
the bug was in line 18 of the original code. The cacheId is calculated differently:
[cpp]const uint32_t cacheID = initialAPICID & (-1 ^ mask)[/cpp]
that way every cacheID is unique.
Thanks everyone for the help!
Robert
just for completeness I'd like to post the solution to the problem:
the bug was in line 18 of the original code. The cacheId is calculated differently:
[cpp]const uint32_t cacheID = initialAPICID & (-1 ^ mask)[/cpp]
that way every cacheID is unique.
Thanks everyone for the help!
Robert
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page