Processors
Intel® Processors, Tools, and Utilities
14818 Discussions

NUMA CPU affinity issues

Rama27
Beginner
1,888 Views

I have a Ubuntu server with 22.04 OS. I encountered an issue with NUMA affinity bound.

1.) Server has two NUMA nodes accordingly.

server-1:# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
node 0 size: 1031782 MB
node 0 free: 989481 MB
node 1 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
node 1 size: 1021920 MB
node 1 free: 1018313 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

 

2.) When I setting CPU Affinity for a new process with cpu 0 which is part of numa node0, it spanned multiple cpu's among both numa nodes.

server-1:# taskset -c 1 cat /dev/random >& /dev/null &
[1] 691247

server-1:# taskset -cp 691247
pid 691247's current affinity list: 0,1,30,48-51,96,97,144-147

I think it should stick to cpu 1 only rather than multi cpu. Not sure why taskset always shows "current affinity list: 0,1,30,48-51,96,97,144-147" though we changed coreCan you please let me know how this happened.

0 Kudos
23 Replies
Vipin_Singh1
Moderator
1,416 Views

Hi Rama, we would like to inform you that we are routing your query to the dedicated team for further assistance.


0 Kudos
Pintu
Employee
1,374 Views

Hello Rama27,


Greetings for the day! 


Regarding these NUMA CPU affinity issues, kindly help us with the below steps: 


1. What specific performance problems or challenges are you encountering with your application or workload?

2. Are you experiencing any slowdowns, latency issues, or unexpected behavior?

3. Could you provide details about your system's hardware configuration, including the number of CPUs and memory setup?

4. Kindly confirm if you are aware of any NUMA architecture in your system design.

5. Have you observed any patterns or behaviors suggesting that CPU affinity might be impacting performance?

6. Please confirm if certain processes or threads are consistently running slower or experiencing higher latency.


Kindly help with the above steps to proceed further.


Thank you for choosing Intel products and services.


Best Regards,

Manoranjan Das.


0 Kudos
Rama27
Beginner
1,319 Views

Hi, Thank you for looking into it. Updates are as follow:

1. What specific performance problems or challenges are you encountering with your application or workload?
--- The Ubuntu installed with 22.04 which has 4 numa nodes. Each time we deploy, the memory allocations are random. i.e MemUsed is different each time. We don't have explicit numa rules configured.

2. Are you experiencing any slowdowns, latency issues, or unexpected behavior?
--- Since the allocations are random, obviously overhead accessing memory on remote node.

3. Could you provide details about your system's hardware configuration, including the number of CPUs and memory setup?
--- The server is Lenovo MB, "Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz", 4 sockets(numa nodes) with 192 cpu's. RAM is 748GB

4. Kindly confirm if you are aware of any NUMA architecture in your system design.
--- I'm aware of NUMA.

5. Have you observed any patterns or behaviors suggesting that CPU affinity might be impacting performance?
--- Though we don't have explicit numa configurations, memory allocations are randomly that causing some performance issues. i.e processes run on a numa node, while memory being allocated on different nodes.

6. Please confirm if certain processes or threads are consistently running slower or experiencing higher latency.
a.) Queried cron process current allocation.
root@server-1:~# taskset -cp $(pidof cron)
pid 8225's current affinity list: 0,24,25,48,49,72,73,96,120,121,144,145,168,169

b.) Delibaretly, changed cpu to 2 which shows it is success.
root@server-1:~# taskset -cp 2 $(pidof cron)
pid 8225's current affinity list: 0,24,25,48,49,72,73,96,120,121,144,145,168,169
pid 8225's new affinity list: 2

c.) After querying, it shows previous state.
root@server-1:~# taskset -cp $(pidof cron)
pid 8225's current affinity list: 0,24,25,48,49,72,73,96,120,121,144,145,168,169
0 Kudos
Pintu
Employee
1,300 Views

Hello Rama27,


Greeting for the Day!


As per this issue, kindly confirm if you are getting any errors on your screen; if yes, please share the picture with us to proceed further; and please confirm if there are any additional issues you are facing.


Thank you for choosing Intel products and services.


Regards,

Manoranjan Das.


0 Kudos
Rama27
Beginner
1,291 Views

Hi Manoranjan,

We are commencing the tuning to enforce cpu and memory polices so that processes can be confined to particular node only to avoid remote access. Before that, we are doing some benchmarking to gain better performance.
As per my previous update, if I tried to confine the process on specific cpu, it was successful for a while, then it reverted itself after a while.

My server default policy is 'default' (prefer local node for mem allocation). If we notice mem allocation for a process(cron) that spanned on two nodes with 'default' policy.

 

558204b35000 default file=/usr/sbin/cron dirty=3 N2=3 kernelpagesize_kB=4
7f94216cd000 default file=/usr/lib/x86_64-linux-gnu/libcap-ng.so.0.0.0 dirty=2 mapmax=23 N0=2 kernelpagesize_kB=4
7f94216d3000 default file=/usr/lib/x86_64-linux-gnu/libcap-ng.so.0.0.0 anon=1 dirty=1 active=0 N2=1 kernelpagesize_kB=4



 

If we notice the numastat statistics, memory allocation happened with 'interleave' too. I've verified numa_maps for all running processes in the server which all show 'default'. If 'default' is the default policy, how could system allocated with 'interleave'? How OS determines allocations itself though there are no explicit rules.

 

root@server-1:~# numastat
                           node0           node1           node2           node3
numa_hit                23888413       107767334        88675695        36266619
numa_miss                      0               0               0               0
numa_foreign                   0               0               0               0
interleave_hit             28652           28388           28641           28375
local_node              23382768       107605787        88631499        36191765
other_node                505645          161547           44196           74854

 

 

0 Kudos
Pintu
Employee
1,290 Views

Hello Rama27,


Greeting for the Day!


We appreciate your patience. Please allow some more time to examine this matter.


Thank you for choosing Intel products and services.


Regards,

Manoranjan Das.


0 Kudos
Pintu
Employee
1,112 Views

Hello Rama27,


Greeting for the Day!


Thank you for your response. We are currently checking with the internal team regarding this particular matter, and we will update you on the status soon.


Thank you for choosing Intel products and services.


Regards,

Manoranjan Das.


0 Kudos
Rama27
Beginner
731 Views

Hi Manoranjan, 

     Sorry for late reply because I was unable to see 'Reply' button in Safari. Not sure why my laptop given it. However, below is my problem statement. When I confine the process to specific cpu(s), it works and it reverts quickly. Thank you. Please let me know, if any details that I can share. 

 

a.) Queried cron process current allocation.
root@server-1:~# taskset -cp $(pidof cron)
pid 8225's current affinity list: 0,24,25,48,49,72,73,96,120,121,144,145,168,169

b.) Delibaretly, changed cpu to 2 which shows it is success.
root@server-1:~# taskset -cp 2 $(pidof cron)
pid 8225's current affinity list: 0,24,25,48,49,72,73,96,120,121,144,145,168,169
pid 8225's new affinity list: 2

c.) After querying, it shows previous state.
root@server-1:~# taskset -cp $(pidof cron)
pid 8225's current affinity list: 0,24,25,48,49,72,73,96,120,121,144,145,168,169

 

0 Kudos
Pintu
Employee
938 Views

Hello Rama27,


Greetings!


Based on this output, please follow the provided steps for NUMA CPU affinity issues.


Identify Affected Processes:

Determine which processes are experiencing NUMA CPU affinity issues.


Set CPU Affinity:

Explicitly set CPU affinity for affected processes using tools like 'taskset' or 'numactl' to ensure they remain on specific CPUs or NUMA nodes.


Monitor Performance:

Continuously monitor system performance and NUMA behavior using tools like numastat to assess the effectiveness of the CPU affinity settings.


Adjust Kernel Parameters:

Review and adjust kernel parameters related to NUMA and CPU affinity as needed for optimal performance.


By following these steps, you can address NUMA CPU affinity issues and optimize system performance effectively.


Thank you for using Intel products and services.


Best regards,

Manoranjan.


0 Kudos
Pintu
Employee
821 Views

Hello Rama27,


Greetings!


We are currently awaiting your response regarding the case. If you have any queries or require further assistance, please feel free to respond on the community post. We are more than happy to assist you.


Thank you for using Intel products and services.


Best regards,

Manoranjan. 


0 Kudos
Pintu
Employee
777 Views

Hello Rama27,

 

Greetings for the day!

 

I hope this message finds you well.

 

We are following up to find out if you were able to find the information we provided. Please reply to confirm, so we can continue helping on a resolution. Looking forward to receiving your reply.

 

Regards,

Manoranjan.


0 Kudos
Pintu
Employee
712 Views

Hello Rama27,

 

Greetings for the day! 

 

We would like to inform you that we are closing this request as no response has been received from our previous follow-ups.

 

Please don't hesitate to ask any further questions in the future. Feel free to start a new conversation, as this thread will no longer be monitored.


Thank you for using Intel products and services.


Regards,

Manoranjan


0 Kudos
Pintu
Employee
601 Views

Hello Rama27,


Greetings for the day!


Sorry for the delay in response.


As per this case, please confirm if the CPU is compatible with Non-Uniform Memory Access (NUMA) and provide guidance on configuring NUMA settings both in the BIOS and within the operating system.


Thank you for choosing Intel products and services.


Regards,

Manoranjan.


0 Kudos
Rama27
Beginner
586 Views

Hi Manoranjan, Yes, We enabled NUMA in BIOS and no particular changes in OS level. Any turning also required in OS as well ?

0 Kudos
Pintu
Employee
575 Views

Hello Rama27,


Greetings for the day!


As per this case, we are checking with the internal team; once we get the update, we will update the status with you.


Thank you for choosing Intel products and services.


Regards,

Manoranjan.


0 Kudos
Pintu
Employee
544 Views

Hello Rama27,


Greetings for the day!


As per your query, is any tuning also required in OS as well? Please confirm if you meant tuning or turning.


Thank you for choosing Intel products and services.


Regards,

Manoranjan.


0 Kudos
Rama27
Beginner
527 Views

Hi, I meant it is tuning. The primary problem is unable to confine the process to specific cpu as per the problem statement. 

0 Kudos
Pintu
Employee
467 Views

Hi Rama27,


Greeting for the day!


Sorry for the delay in response.


Please ensure that the NUMA configuration is correctly set up both at the OS level and in the BIOS. Since we do not provide OS configurations and the BIOS is from Lenovo, we recommend checking these settings with your OS provider and the BIOS vendor.


Please refer to the below article: Operating System Compatibility and Intel® Xeon® Scalable Processors.

https://www.intel.com/content/www/us/en/support/articles/000055440/processors/intel-xeon-processors.html

 

Thank you for choosing Intel products and services.


Best regards,

Manoranjan


0 Kudos
Pintu
Employee
410 Views

Hello Rama27,

 

Greetings for the day!

  

We are following up to find out if you were able to find the information we provided. Please reply to confirm, so we can continue helping on a resolution. Looking forward to receiving your reply.

 

Regards,

Manoranjan.


0 Kudos
Rama27
Beginner
404 Views

Hi, The Board is Lenovo with Intel's Sapphire Rapid processor.  

0 Kudos
Reply