- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
I'm working on a memory allocator that should be optimized for use on NUMA architectures. It tries to take memory from the node where the current thread resides, and if it can't find there, it "steals" from other nodes.
Here comes the problem: I want to take memory from the nearest node (lowest hop value) to the farthest (so the access is as low as possible), but I couldn't find an API (at least in Windows) to get this type of information (only which CPUs are in a node, not relationships betweennodes).
Is there any way to get the distance between nodes ? It must not benecessarily an API, CPUID/APIC would be very good if there is no other way.
Thanks in advance!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
I'm working on a memory allocator that should be optimized for use on NUMA architectures. It tries to take memory from the node where the current thread resides, and if it can't find there, it "steals" from other nodes.
Here comes the problem: I want to take memory from the nearest node (lowest hop value) to the farthest (so the access is as low as possible), but I couldn't find an API (at least in Windows) to get this type of information (only which CPUs are in a node, not relationships betweennodes).
Is there any way to get the distance between nodes ? It must not benecessarily an API, CPUID/APIC would be very good if there is no other way.
Thanks in advance!
Such a NUMA aware memory allocator is available in QuickThread (http://www.quickthreadprogramming.com). This allocator can be used independent of the QuickThread threading environment (OpenMP, Windows thread, TBB, etc...) as well as within the QuickThread threading environment. QuickThread programming environment permits taskscheduling to within NUMA nodes.
// distribute one instance of task to each NUMA node
parallel_distribute( OneEach_M0$, fooNode, A, B, C);
...
void fooNode(intptr_t teamMember, intptr_t teamSize, A_t A, B_t B, C_t C)
{
// within this NUMA node run slice of array
parallel_for( M0$, foo, 0, countFoo, teamMember, teamSize, A, B, C);
}
void foo(intptr_t teamMember, intptr_t teamSize, intptr_t iBegin, intptr_t iEnd A_t A, B_t B, C_t C)
{
... // run slice of arrays (iBegin, iEnd]
}
Such a NUMA aware allocator is more effective when the problem has a majority of random memory access as opposed to a majority of serial memory access. On serial access problems most processors can be configured for pre-fetch and many serial-like access programs will not out pace the pre-fetch (regardless of inter-node access). On this subset of problems the aware NUMA allocator is not as beneficial for reducing time for memory access. This noted, when the application has a high flux in allocation/de-allocation the NUMA aware allocator will "dilute" the frequency of CAS/Interlocked/Locked operations by 1/number of NUMA nodes (assuming allocations/de-allocations evenly distributed). Most such scalable allocators allocate blocks of similarly sized items and the frequency of CAS/Interlocked/Locked operations are already reduced to when the allocations consume/fill a block.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
On Linux you can use numa_distance() function from libnuma:
http://linux.die.net/man/3/numa
On Windows I am not aware of any such API. Sad. You may try to investigate sources of libnuma, maybe it will give you the solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You will need to write your own numa_distance() for Windows using CPUID
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But how to get distance from CPUID?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://msdn.microsoft.com/en-us/library/aa366533%28VS.85%29.aspx
mentions the various facilities provided by Windows for NUMA aware allocation, although it's unusually cryptic.
http://developer.amd.com/documentation/articles/pages/4172007166.aspx
gives some recommendations, in case you want assurance of multi-vendor platform support. Also, there are references on the differences between application level usage on Windows and other OS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the answers.
I have found a solution for Windows (though not tested yet). The distances can be found in the SLIT - System Locality Information Table - an extension provided by OEMs and available through ACPI. It contains a N_CPU x N_CPU matrix that describes the distances ([1][3] is the distance between nodes 1 and 3;
Windows Vista+ has the method GetSystemFirmwareTable that can be used to retrieve this table (it seems that under XP it can be retrieved from the Registry, but it's more difficult).
The structure of this table can be found in the ACPICA package (http://www.acpica.org).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for posting the info here! That's not every day I'm able to get to know something new :)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page