Processors
Intel® Processors, Tools, and Utilities
15357 Discussions

Why RDT can't reduce disturbance?

Tyree
Beginner
494 Views

On my server, I have a latency critical (LC) task running on NUMA0. To increase utilization, I have also deployed some best-effort (BE) tasks on the same NUMA0. Furthermore, I have enabled some isolation features of cgroup to protect the LC task. Despite this, I found the performance of the LC tasks has decreased. Using perf stat, I discovered that the IPC of the LC task dropped from 1.3 to 1.2 after deploying BE tasks. I attempted to use "perf sata -e" with some raw event counters to see the micro-architectural metrics of the LC task. I noticed that the L1 and L2 miss ratios remained the same after deploying the BE task, while the L3 miss ratios increased from 6% to 9%.

I also tried to use intel RDT to limit the LLC occupancy of the BE tasks. The total ways of my LLC cache is 12, so I limit BE tasks to use only 1/12. However, the phenomenon remains the same(the L3 cache miss ratios is still around 9%). Limiting the memory bandwidth usage of BE tasks to 10% does not help as well.

Currently, I have no idea how to analyze further to figure out why only L3 miss ratios increased and what the root cause of this phenomenon is.

I hope to get some help here. Sincerely thanks!

Here is my CPU info and perf reports before and after deploying BE tasks.

I have been run perf for several times and the results are almost the same.

cpu info

```
chitecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz
Stepping: 6
CPU MHz: 2799.999
CPU max MHz: 3400.0000
CPU min MHz: 800.0000
BogoMIPS: 4400.00
Virtualization: VT-x
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 49152K
NUMA node0 CPU(s): 0-31,64-95
NUMA node1 CPU(s): 32-63,96-127
```

perf stat of the LC task **before** deploying be tasks

```
2,507,581,247,482 instructions # 1.31 insn per cycle (57.70%)
1,919,207,125,419 cycles (64.09%)
6,577,501,022 L2_RQSTS.DEMAND_DATA_RD_MISS (64.05%)
20,740,750,037 L2_RQSTS.ALL_DEMAND_DATA_RD (64.02%)
6,935,557,793 L2_RQSTS.ALL_RFO (64.03%)
1,534,425,251 L2_RQSTS.RFO_MISS (64.07%)
574,045,503 OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD (64.01%)
342,337,011 MEM_LOAD_RETIRED.L3_MISS (64.10%)
5,867,009,991 MEM_LOAD_RETIRED.L2_MISS (64.01%)
16,980,212,379 MEM_LOAD_RETIRED.L1_MISS (63.95%)
308,951,137,867 MEM_LOAD_RETIRED.L1_HIT (51.37%)
10,758,173,508 MEM_LOAD_RETIRED.L2_HIT (51.37%)
5,975,591,272 MEM_LOAD_RETIRED.L3_HIT (51.42%)
306,291,575,807 MEM_LOAD_RETIRED.ALL_LOADS (51.50%)
188,045,958,463 MEM_LOAD_RETIRED.ALL_STORES (51.36%)

120.099298650 seconds time elapsed

The L2 & L3 cache miss ratios are calculated as following:
L2 miss% = MEM_LOAD_RETIRED.L2_MISS / MEM_LOAD_RETIRED.L1_MISS = 5,867,009,991 / 16,980,212,379 = 34.5%
L3 miss% = MEM_LOAD_RETIRED.L3_MISS / MEM_LOAD_RETIRED.L2_MISS = 342,337,011 / 5,867,009,991 = 5.8%
```

perf stat of the LC task **after** deploying be tasks

```
2,608,552,205,051 instructions # 1.19 insn per cycle (57.53%)
2,183,570,198,313 cycles (63.94%)
7,745,052,206 L2_RQSTS.DEMAND_DATA_RD_MISS (64.01%)
23,073,900,674 L2_RQSTS.ALL_DEMAND_DATA_RD (64.01%)
7,164,922,408 L2_RQSTS.ALL_RFO (64.03%)
1,661,683,201 L2_RQSTS.RFO_MISS (64.05%)
825,912,577 OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD (64.02%)
583,675,847 MEM_LOAD_RETIRED.L3_MISS (64.05%)
6,438,858,223 MEM_LOAD_RETIRED.L2_MISS (63.99%)
18,199,655,787 MEM_LOAD_RETIRED.L1_MISS (63.84%)
320,077,134,491 MEM_LOAD_RETIRED.L1_HIT (51.25%)
11,351,727,437 MEM_LOAD_RETIRED.L2_HIT (51.22%)
6,275,779,430 MEM_LOAD_RETIRED.L3_HIT (51.16%)
318,136,944,465 MEM_LOAD_RETIRED.ALL_LOADS (51.27%)
195,803,999,110 MEM_LOAD_RETIRED.ALL_STORES (51.34%)

120.116917905 seconds time elapsed

L2 miss% = L2 miss / l1 miss = 6,438,858,223 / 18,199,655,787 = 35.3%
L3 miss% = L3 miss /l2 miss = 583,675,847 / 6,438,858,223 = 9.1%
```

perf stat of the LC task **after** deploying be tasks and limit LLC occupancy.

```
2,637,374,229,407 instructions # 1.18 insn per cycle (50.91%)
2,232,720,790,691 cycles (56.61%)
7,766,059,047 L2_RQSTS.DEMAND_DATA_RD_MISS (56.68%)
22,716,753,053 L2_RQSTS.ALL_DEMAND_DATA_RD (56.70%)
6,872,784,090 L2_RQSTS.ALL_RFO (56.70%)
1,679,610,667 L2_RQSTS.RFO_MISS (56.67%)
885,846,658 OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD (56.67%)
646,751,270 MEM_LOAD_RETIRED.L3_MISS (56.59%)
6,877,904,780 MEM_LOAD_RETIRED.L2_MISS (56.42%)
18,698,108,046 MEM_LOAD_RETIRED.L1_MISS (56.33%)
320,617,341,214 MEM_LOAD_RETIRED.L1_HIT (45.13%)
12,006,256,563 MEM_LOAD_RETIRED.L2_HIT (45.12%)
6,499,185,771 MEM_LOAD_RETIRED.L3_HIT (45.15%)
321,649,950,616 MEM_LOAD_RETIRED.ALL_LOADS (45.17%)
196,989,430,826 MEM_LOAD_RETIRED.ALL_STORES (45.21%)
9,709,983 MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM (45.29%)
52,088,603 MEM_LOAD_L3_MISS_RETIRED.REMOTE_FWD (45.40%)

120.087210363 seconds time elapsed

l2 miss % = 6,877,904,780 / 18,698,108,046 = 36.7%
l3 miss % = 646,751,270 / 6,877,904,780 = 9.4%

```

Labels (1)
0 Kudos
3 Replies
Sazirah
Employee
466 Views

Hi Tyree,


Thank you for posting in Intel Community Forum.


Regarding this issue, may we know if you have tried to disable the BE tasks and monitor on the LC task? Is it still decreasing?


Regards,

Sazzy_Intel


0 Kudos
Tyree
Beginner
435 Views

Hi  Sazirah,

 

Yes, I have tried to disable the BE tasks and only monitor on the LC task. 

I run perf to monitor the LC task several times and the IPC results are same (around 1.3).

0 Kudos
Azeem_Intel
Employee
403 Views

Hello Tyree.


Greeting!


Based on the research for Intel® Resource Director Technology, this is out of our scope of support. We recommend that you refer to the Software Development Technologies forums and post your query here:


 https://community.intel.com/t5/Software-Development/ct-p/software-dev-technologies



Best Regards,

Azeem_Intel


0 Kudos
Reply