Q & A: Detecting Multi-Core Processor Topology in an IA-32 Platform

Intel_Software_Netw1 · ‎04-17-2007

These questions were received by Intel Software Network Support, followed by the responses received from our engineering contacts and the authors of the original article at http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration:

Q. Has anyone done testing on the cpucount.exe on a multi-node platform (for example, an IBM x460/x3950-MXE460/x3950) running Microsoft* Windows* 2003 32-bit with SP1?

A. This paper clarifies that cluster installations are not covered, natually, the reference code did not consider multi-node cluster in its scope. The bottom line is that the 3 level topology can be extended for cluster installations, but the cluster vendor needs to architect a 4 level topology scheme that extends from the 3-level scheme and observe the 8-bit size limit of initial apic id infrastructure. In such cases, our reference code will report the lower 3 level of topology, it is up to the user to extend this reference code to report the cluster node id (the 4th level). The cluster vendor may choose other schemes, and our reference code cannot possibly know how of deal with vendor specific schemes.

Q. I would like to know more about determining cluster IDs. All documents, including the SDM, mention cluster but stop short of describing how to identify cluster IDs separate from package IDs. This makes sense since cluster is a system-level issue and not something that the processor u-arch determines. Aside from knowing that a specific chipset, system or node has a clustered topology, can system software discover cluster ID masks in a portable manner, possibly involving a BIOS or ACPI standard?

A. Cluster ID is really a cluster vendor issue. The cluster software stack can choose what kind of protocol/schema that it uses to indentify the entities within its topology. The identity in a cluster obviously need not be constrained by the 8-bit widith of initial APIC ID that cpu microcode negotiate/assigns logical processors within a node. The overall identification scheme in a cluster is really vendor specific. Intels BIOS writer guide provides as a recommendation to allow cluster vendors assign IDs within a cluster in a manner that is compatible to the package_ID, core_ID, SMT_ID are laid out.

For example, if a particular 4-socket SMP node uses 6-bits in the package ID, a cluster vendor can carve out the upper bits to use as cluster ID (or even extend that cluster I D beyond the 8-bit constaints of initial APIC ID). Leaving cluster aside, initial APIC ID is the identification hw uses to uniquely indentify each logical processor in a system. The OS uses some OS constructs (affinity mask bit in Windows*; In Linux it uses 0-based contiguous integer numbers) These OS constructs has its lineage in initial APIC ID on Intel platform, but allows OS to fulfill its hw independence objectives.

Q. Can the CPUID.4 field Logical Processors Sharing a Cache be used to infer the actual identities of the logical processors that share a cache?

A. "Logical Processors Sharing a Cache is used to derive cache sharing topology in a system, similar to enumerating processor topology. See more information about this in the Software Optimization Manuals at http://www.intel.com/products/processor/manuals/.

Q. I would like some supplementary information about the error message "assertion failed: PhysicalNum * MaxLogicalProcPerPhysicalProc() >= ToAvailLogical, file cpucountcode.c, line 171" when trying to use this with certain older processors.

A. As the topology of MP platforms evolves, CPUID instructions are extended to provide new fields to provide data to assist software to enumerate processor topology. The CPUID features needed to detect platform in 2001, 2003, 2005 may not be present in older processors like the Intel Pentium III Xeon processors. Despite our best effort to make the processor enumeration algorithm robust and backward compatibile, it can not be done backward enough to cover the Intel Pentium III Xeon processors. In the white paper, we explained the roles of several key CPUID features: i) CPUID.1:EBX[31:24]:Initial APIC ID, ii) CPUID.1:EBX[23:16]: Maximum # logical processors per physical package, iii) CPUID.4:EAX[31:26] +1: Max # of cores per physical package. If any of these three pieces of data are not present or valid, the algorithm cannot work. Therefore we put in the assert statement to guard against problems such as this. When you attempt to run the code on very old processors, the code behaves as expected because not all of the basic input data needed to enumerate processor topology are present/valid.

Q. I have downloaded the cpucount.cpp and successfully built it on Linux and Windows operating systems. Could you please let me know what precautions to take if I want to convert it to the C programming language?

A. The code uses C++ specific features minimally, primarily in local variable declarations. Therefore,converting to the C programming language should be straig htforward if the developer has a good knowledge of C programming.

Q. I'm using CPUCount in order to detect theCPU on anIntel Core 2 machine. I'm developing an ActiveX that runs in the context of another application that opens other threads except the one that CPUCount is running in. my problem it that with this scenario, GetAPIC_ID returns the same ID. When Im running a standalone executable or my ActiveX is using a process that dosent open additional threads, GetAPIC_ID works fine and returns different IDs. Im encountering this problem only on the Core 2 platform in Duo or Pentium D everything is fine.

The process affinity mask is the same for the process in which the problem isreproduced and the one in which it isn't (the affinity is 3 for both of them). The only difference is the fact that the 'problematic' process has additional active threads when CPUCount code is called and the other process only has the main thread. In addition, the behavior is inconsistent on other IntelCore platforms.

Do you have any workaround or recommended code change for that?

A. Sounds like the issue is between the interaction of OS and process attributes.

CPUCount needs to bind the execution context to each logical processor so it can query APIC_ID for each logical processor (or free to migrate).

Whether you run the CPUcount code as a standalone EXE or as an ActiveX, each process inherits certain attributes from its parent process. One of these attributes is the affinitymask the process is allowed to run; and similarly the affinitymask a child process is allowed to run can be set.

In a normal situation, the OS usually allows user processes to run on any logical processor in the system. When a child process is created, usually the parents affinitymask is inherited by default. This is the underlying assumption that CPUcount requires.

If the other application that creates additional child threads decides that the main thread should be bound to run on a single logical processor (not allowing migration), your injection of the ActiveX into the main thread probably inherited the no-migration restriction; then the assumption of free-to-migrate premise is not valid.

You will need to figure out how to ensure the injection of ActiveX into the other application can fulfill the free-to-migrate/bind to any logical processor for the CPUcount code.

The key is that the execution context of the ActiveX (running cpucount-like code) needs to be freely-migratable on all logical processes in the system. Thereis probably more than one API in the OS that can change process attributes, including those that create a new thread. You can see the symptom that a thread execution context got stuck; what caused it is really in the interaction of application and OS, not the hardware. To our knowledge on affinity masks and task scheduling of the OS, they dont treat the dual-core processors you referred to differently.

The other point is that software can execu te GetProcessAffinityMask and retrieve the affinity mask for the process dynamically, The dynamic aspect means the information is guaranteed valid only at the point you made the inquiry. There are OS APIs and services that can probably change that dynamically as well.

==

Gina B.
Intel Software Network Support
http://www.intel.com/software
email: ISN.support@intel.com

Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

Intel_Software_Netw1 · ‎05-08-2007

Q. Are affinity masks in the output of cpucount.exe somehow related to CPU numbers I see in Windows Server 2003? If yes, how?

A.We assume your reference to CPU number is the numbering of the check boxes that Windows* Task Manager will display when a user selects the Process tab in Task Manager and right clicks on a user process, which displays a list of items. Each OS-enabled logical processor shows up under the list item affinity as a check box, labeled: cpu 0, cpu 1, up to the number of affinity mask bits supported by a given OS.

For a 32-bit OS, affinity mask is limited to 32 bits. For a 64 bit OS, there are 64 bits. A grey check box in Task Manager indicates there is either no CPU hardware or CPU hardware is not enabled corresponding to that particular affinity mask bit position. CPU hardware may have been left in a state not visible to the OS by either a BIOS menu choice or optional Windows* boot parameters.

Each number in the label cpu n corresponds to an affinity mask bit position.

Affinity mask is the interface the Windows* API provides for software to manage process affinity to hardware CPUs.

The reference code in cpucount provides the mapping between affinity mask to initial APIC IDs (which are assigned during platform initialization by hardware). Processor topology enumeration is derived from initial APIC IDs.

Q. Thank you for the swift answer and it is indeed very comprehensive. What I'm really interested in, though, is whether the following relationships hold:

Windows CPU #0 -> AffinityMask = 1; Initial APIC = 0 ...

Windows CPU #1 -> AffinityMask = 2; Initial APIC = 4 ...

Windows CPU #2 -> AffinityMask = 4; Initial APIC = 12 ...

Windows CPU #3 -> AffinityMask = 8; Initial APIC = 8 ...

Windows CPU #4 -> AffinityMask = 16; Initial APIC = 2 ...

Windows CPU #5 -> AffinityMask = 32; Initial APIC = 6 ...

Windows CPU #6 -> AffinityMask = 64; Initial APIC = 14 ...

Windows CPU #7 -> AffinityMask = 128; Initial APIC = 10 ...

Windows CPU #8 -> AffinityMask = 256; Initial APIC = 1 ...

Windows CPU #9 -> AffinityMask = 512; Initial APIC = 5 ...

Windows CPU #10 -> AffinityMask = 1024; Initial APIC = 13 ...

Windows CPU #11 -> AffinityMask = 2048; Initial APIC = 9 ...

Windows CPU #12 -> AffinityMask = 4096; Initial APIC = 3 ...

Windows CPU #13 -> AffinityMask = 8192; Initial APIC = 7 ...

Windows CPU #14 -> Affinity Mask = 16384; Initial APIC = 15 ...

Windows CPU #15 -> AffinityMask = 32768; Initial APIC = 11 ...

A. Since we already explained there is a 1:1 relationship between
i)cpu n
ii) affinity mask bit k,
iii)Each unique initial APIC ID.

We assume you may be asking about the numerical mapping of AffinityMask = 1; Initial APIC = 0, AffinityMask = 2; Initial APIC = 4, etc

Affinity mask is a construct of the OS. Within the Microsoft Windows* XP code base, each different OS release may have an internal implementation that tries to optimize thread scheduling by the OS scheduler according to certain hardware configurations that the OS understands as that particular release was designed. To say this simply, the numerical mapping can vary even within an OS family. There is no guarantee, for example, AffinityMask = 1; Initial APIC = 0 must be true.

==

Lexi S.

IntelSoftware NetworkSupport

http://www.intel.com/software

Contact us

Intel_Software_Netw1 · ‎05-16-2007

Q. I have a question about line 274 in the code, which is:

return(0); unsigned int MaxInputValue =0;

The code after the return (0); seems like unreachable code. Can it be deleted safely?

A. Yes, this can be removed safely. The line "unsigned int MaxInputValue =0;" is a stow-away that got pasted in inadvertantly.It has no use.

______

Gina B.
Intel Software Network Support
http://www.intel.com/software
email: ISN.support@intel.com

Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

Intel_Software_Netw1 · ‎06-06-2007

Q. Could you please let me know whether it is possible to use (and if not which modifications would be required) CPUCOUNT.EXE to collect information from a REMOTE Windows server?

A. The answer is yes and no. Since launching any application on a remote server requires certain privileges, which is implied by the pre-text ofyour question. The question is by which means one can deploy the target app/service remotely. A simple example might be using Windows Remote Desktop orVNC client. Once you find the necessary means to install the CPUCount binary onthe target machine, you can use Remote Desktop orVNC to access the remote target machine and launchCPUCount locally.

If the intent is to do all of that in the same process context, that impliesyou know the necessary network api/toolbox that provide the remote log-in/process control services. What remains is to add needed networking/remote control api into the reference code to create a vertical application that meets yourneeds. The additional topics of network topology, networking api, remote services, etc are beyond the scope of what we provide in theCPUCount source code (processor topology).

______

Gina B.
Intel Software Network Support
http://www.intel.com/software
email: ISN.support@intel.com

Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

jon_kinsey · ‎07-02-2007

Can I use this code in an open-source project?

Intel_Software_Netw1 · ‎07-13-2007

Thanks for your patience -- the authors had to research the answer. They told us theyare not aware of any restrictionson usingthe reference code in any open-source or non-open-source projects.

==

Lexi S.

IntelSoftware NetworkSupport

http://www.intel.com/software

Contact us

james_keeler · ‎08-06-2007

I have edited the CPP file, but do not have a C++ compiler. Do you have a reccommendation for a free compiler?

Intel_Software_Netw1 · ‎08-06-2007

The Intel Compilers are available to evaluate for free for 30 days: http://www.intel.com/cd/software/products/asmo-na/eng/219690.htm

==

Lexi S.

IntelSoftware NetworkSupport

http://www.intel.com/software

Contact us

skyewire · ‎08-21-2007

I have a few questions about CPUCount:

1) After compiling CPUCount for Linux with standard optimizations turned on (using "-O" flag for GCC) it crashes with a segmentation fault. CPUCount runs fine if I do not use the "-O" flag. I am using RHEL 5 (kernel 2.6.18) and GCC 4.1.1. The segmentation fault seems to happen just before returning from the find_maskwidth() function. Is this a bug in CPUCount, or is it a RHEL/GCC problem?

2) An earlier Q&A said it should be straightforward to port CPUCount from C++ to C. However, the CpuIDSupported() and GenuineIntel() functions use try/catch constructs. How would one write these functions in C?

3) Will CPUCount be updated to support IA64 on Windows and Linux?

Thanks in advance!

Intel_Software_Netw1 · ‎03-14-2008

Skyewire,

One of the authors responded:

1) Weve received a handful of reports that point to the sensitivity of compiler/kernel environment relative to the the inline assembly style of code; I believe almost of all of them came under Red Hat Linux*. My personal experience of Red Hatis limited -- several attempts to install Red Hat on various machines have proven to be a lot more eventful than installing other distros -- so I am unable to provide further insight on specific issues of compiling under Red Hat. In the upcoming revision of the reference code (later in 2008), we plan to not rely on inline assembly.

2) The try and except technique is deprecated in the upcoming revision. The CPUID instruction is availble since early 1990, and multiprocessor support started with Pentium Pro in 1996, I dont expect there is much need to run this topology enumeration algorithm on processors that dont support CPUID.

3) There is no IA-64 version of cpucountplanned, but on the Itanium architecture, topology information is provided as part its Processor Abstraction Layer (PAL) service. The Intel Itanium Architecture Software Developer's Manual - Volume 2: System Architecture provides information on what services you can all into PAL.

==

Lexi S.

IntelSoftware NetworkSupport

http://www.intel.com/software

Contact us

LexiS_Intel · ‎01-03-2013

This content has been superseded by:

Intel® 64 Architecture Processor Topology Enumeration

SergeyKostrov · ‎01-24-2013

These posts are really good consolidation! I think the same consolidation of Q & A has to be done for: 'Application of long double floating-point data type with different C/C++ compilers'.