Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

How to get distance (hops) between NUMA nodes?

Lup_Gratian
Beginner
2,744 Views

Hi!

I'm working on a memory allocator that should be optimized for use on NUMA architectures. It tries to take memory from the node where the current thread resides, and if it can't find there, it "steals" from other nodes.

Here comes the problem: I want to take memory from the nearest node (lowest hop value) to the farthest (so the access is as low as possible), but I couldn't find an API (at least in Windows) to get this type of information (only which CPUs are in a node, not relationships betweennodes).

Is there any way to get the distance between nodes ? It must not benecessarily an API, CPUID/APIC would be very good if there is no other way.

Thanks in advance!

0 Kudos
8 Replies
TimP
Honored Contributor III
2,744 Views
This goes way beyond what even an expert could discuss here, or what is disclosed about Windows. It might be educational to study what is done in glibc.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,744 Views
Quoting lgratian

Hi!

I'm working on a memory allocator that should be optimized for use on NUMA architectures. It tries to take memory from the node where the current thread resides, and if it can't find there, it "steals" from other nodes.

Here comes the problem: I want to take memory from the nearest node (lowest hop value) to the farthest (so the access is as low as possible), but I couldn't find an API (at least in Windows) to get this type of information (only which CPUs are in a node, not relationships betweennodes).

Is there any way to get the distance between nodes ? It must not benecessarily an API, CPUID/APIC would be very good if there is no other way.

Thanks in advance!

Such a NUMA aware memory allocator is available in QuickThread (http://www.quickthreadprogramming.com). This allocator can be used independent of the QuickThread threading environment (OpenMP, Windows thread, TBB, etc...) as well as within the QuickThread threading environment. QuickThread programming environment permits taskscheduling to within NUMA nodes.

// distribute one instance of task to each NUMA node
parallel_distribute( OneEach_M0$, fooNode, A, B, C);
...

void fooNode(intptr_t teamMember, intptr_t teamSize, A_t A, B_t B, C_t C)
{
// within this NUMA node run slice of array
parallel_for( M0$, foo, 0, countFoo, teamMember, teamSize, A, B, C);
}

void foo(intptr_t teamMember, intptr_t teamSize, intptr_t iBegin, intptr_t iEnd A_t A, B_t B, C_t C)
{
... // run slice of arrays (iBegin, iEnd]
}

Such a NUMA aware allocator is more effective when the problem has a majority of random memory access as opposed to a majority of serial memory access. On serial access problems most processors can be configured for pre-fetch and many serial-like access programs will not out pace the pre-fetch (regardless of inter-node access). On this subset of problems the aware NUMA allocator is not as beneficial for reducing time for memory access. This noted, when the application has a high flux in allocation/de-allocation the NUMA aware allocator will "dilute" the frequency of CAS/Interlocked/Locked operations by 1/number of NUMA nodes (assuming allocations/de-allocations evenly distributed). Most such scalable allocators allocate blocks of similarly sized items and the frequency of CAS/Interlocked/Locked operations are already reduced to when the allocations consume/fill a block.

Jim Dempsey

0 Kudos
Dmitry_Vyukov
Valued Contributor I
2,744 Views

Hi,

On Linux you can use numa_distance() function from libnuma:

http://linux.die.net/man/3/numa

On Windows I am not aware of any such API. Sad. You may try to investigate sources of libnuma, maybe it will give you the solution.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,744 Views

You will need to write your own numa_distance() for Windows using CPUID

Jim

0 Kudos
Dmitry_Vyukov
Valued Contributor I
2,744 Views

But how to get distance from CPUID?

0 Kudos
TimP
Honored Contributor III
2,744 Views

http://msdn.microsoft.com/en-us/library/aa366533%28VS.85%29.aspx

mentions the various facilities provided by Windows for NUMA aware allocation, although it's unusually cryptic.

http://developer.amd.com/documentation/articles/pages/4172007166.aspx

gives some recommendations, in case you want assurance of multi-vendor platform support. Also, there are references on the differences between application level usage on Windows and other OS.

0 Kudos
Lup_Gratian
Beginner
2,744 Views

Thanks for the answers.

I have found a solution for Windows (though not tested yet). The distances can be found in the SLIT - System Locality Information Table - an extension provided by OEMs and available through ACPI. It contains a N_CPU x N_CPU matrix that describes the distances ([1][3] is the distance between nodes 1 and 3; is always 10; below 10 is invalid).

Windows Vista+ has the method GetSystemFirmwareTable that can be used to retrieve this table (it seems that under XP it can be retrieved from the Registry, but it's more difficult).
The structure of this table can be found in the ACPICA package (http://www.acpica.org).

0 Kudos
Dmitry_Vyukov
Valued Contributor I
2,744 Views

Thanks for posting the info here! That's not every day I'm able to get to know something new :)

0 Kudos
Reply