The command that you are

CPati2 · ‎03-06-2018

Hi All,

In SNC2 we will have 2 MCDRAM (node 2 and 3) nodes while in SNC4 we will have 4 MCDRAM (node 4,5,6,7) nodes. I want to map application for both SNC2 and SNC4 but on MCDRAM nodes not on DDR nodes.

For SNC2 I use: numactl -m 2,3 <application>
For SNC4 I use: numactl -m 4,5,6,7 <application>

Application: Intel Caffe
Number of threads: 16/32/64/128

In SNC2, node 2 is being used and then node 0 (DDR) is being used for memory allocation. I expect above memory allocation to use node 2 and node 3 not node 2 and node 0. Similarly, for SNC4 I am observing that node 4 and node 0 being used for memory and not node 4,5,6,7 as per numactl mapping above.

With this issue, I see performance difference as other nodes of HBM MCDRAM are not being used for memory allocation. It's not making sense to me. Can anyone suggest why this may be happening?

Thanks.

Andrey_Vladimirov · ‎03-08-2018

The command that you are running is correct. You can test your approach with this code:

#include <stdio.h>
int main() {
  int* A = malloc(sizeof(int)*(1L<<30L));
  while (0==0) {
    int i;
#pragma omp parallel for
    for (i = 0; i < (1<<30); i++)
      A = i;
  }
  printf("%d", A[1]);
}

As you can see below, it loads NUMA nodes 4,5,6,7 (the MCDRAM nodes):

[u7474@c006-n004 ~]$ icc -qopenmp test-numa.c
[u7474@c006-n004 ~]$ numactl -m 4,5,6,7 ./a.out &
[1] 22624
[u7474@c006-n004 ~]$ numastat -p 22624

Per-node process memory usage (in MBs) for PID 22624 (a.out)
                           Node 0          Node 1          Node 2
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00            0.00            0.00
Stack                        0.00            0.00            0.00
Private                      0.64            0.09            0.00
----------------  --------------- --------------- ---------------
Total                        0.64            0.09            0.00

                           Node 3          Node 4          Node 5
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00            0.15            0.00
Stack                        0.00           24.52            0.25
Private                      0.50         1081.85         1080.68
----------------  --------------- --------------- ---------------
Total                        0.50         1106.52         1080.93

                           Node 6          Node 7           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00            0.00            0.16
Stack                        0.22            2.22           27.21
Private                    960.00          976.35         4100.11
----------------  --------------- --------------- ---------------
Total                      960.22          978.57         4127.48
[u7474@c006-n004 ~]$

I think that the problem that you are seeing is rooted in your using Caffe from the Intel Distribution for Python. This version of Caffe uses BLAS functions from MKL, and those functions have a mind of their own when it comes to memory allocation. So they are ignoring your settings and doing their own allocation policy.

The good news is that MKL's functions are aware of MCDRAM and generally do the right thing when they allocate working datasets and scratch spaces.

CPati2 · ‎03-08-2018

Hi Andrey,

I tested your code and correct MCDRAM nodes are being used.

Intel Caffe uses Intel MKL, but not Intel Distribution for Python. I have browsed the Caffe code and it doesn't allocate memory to specific nodes.

For Intel Caffe, there is no way to utilize all MCDRAM nodes when in Flat-SNC2/4? Any suggestions please?

Thanks.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Xeon Phi SNC2 and SNC4 Application Mapping