Why do a store stream generate double bandwidth across the UPI link?

NickChiu · ‎02-04-2021

Hi, all~

I'm testing my Xeon 8280M 2-socket system which enables 2 UPI links by using Lmbench bw-mem.

First a local memory access:

numactl -C 0-27 -m 0 ./bw_mem -P 28 1024M wr

the result is 50578.27 MB/s

the output from PCM-memory monitor:

read: 50736.01 MB/s write: 50530.79 MB/s on socket 0. This is just as expected.

Then a remote memory access:

numactl -C 0-27 -m 1 ./bw_mem -P 28 1024M wr

the result is 30226.59 MB/s

the output from PCM-memory monitor:

read: 30666.36 MB/s write: 60385.72 MB/s on socket 1.

Now it confuses me. The write stream bandwidth is doubled on socket 1. I tested another Xeon 6148 platform, the result was the same. Where do these extra write stream come from?

IntelSupport · ‎02-05-2021

Hello NickChiu,

Thank you for posting your question on this Intel® Community.

To better assist you, please provide us with the following information about your environment:

System model:
Are you currently developing an application on this system? If possible, please provide more details about this.
Have you tried using other benchmarks or tools to compare this behavior? If possible, please provide screenshots.

Wanner G.

Intel Customer Support Technician

NickChiu · ‎02-06-2021

Hello Wanner,

Thank you for replying!

1.My system model is Huawei 2288H V5 with dual Intel Xeon Platinum 8280 processors

2.No, it's a clean environment. I'm pretty sure there's no weird application would cause this behavior, if this is what you concern about.

3.Yes, i tried STREAM too. it's another system bandwidth benchmark.

The code is quiet simple:

while(1){
for (j=0; j<stream_array_size; j++) //stream_array_size is large enough to access memory
c[j] = a[j];
}

Again,

when run with "numactl -C 0 -m 0 ./stream":

output from pcm-memory is: read 4795.40 MB/s write 2375.41 MB/s (this is exactly what is expected)

when run with "numactl -C 0 -m 1 ./stream":

output from pcm-memory is: read 4663.65 MB/s write 4544.18 MB/s (write bandwidth doubled too)

For some reason, i'm not able to provide screenshots, sorry about that... it would be appreciated if you could reproduce this result on your environment ^_^, all the benchmarks i used are open-source on github. Thank you for your help!

IntelSupport · ‎02-08-2021

Hello NickChiu,

I appreciate your response.

On your initial post, you stated that you were using an Intel® Xeon® Platinum 8280M Processor.

Please let us know if you are using Intel® Xeon® Platinum 8280 Processor or Intel® Xeon® Platinum 8280M Processor.

Wanner G.

Intel Customer Support Technician

NickChiu · ‎02-09-2021

Hi Wanner,

it's 8280M, sorry about missing the suffix...

IntelSupport · ‎02-09-2021

Hello NickChiu,

I appreciate your response.

I will update this thread soon.

Wanner G.

Intel Customer Support Technician

IntelSupport · ‎02-10-2021

Hello NickChiu,

We are still looking into your request.

By any chance, are you able to confirm the use of the system and company name?

Wanner G.

Intel Customer Support Technician

NickChiu · ‎02-10-2021

Hi, Wanner
Sure, what do I need to do?

IntelSupport · ‎02-10-2021

Hello NickChiu,

I have sent you a message to the email address associated with your profile.

Please reply to this message at your earliest convenience.

Wanner G.

Intel Customer Support Technician

IntelSupport · ‎02-18-2021

Hello NickChiu,

I am still investigating your inquiry and will provide an update by mid-next week.

Are you able to provide your output results data using the Intel® Xeon® Gold 6148 Processor?

I am trying to obtain additional output results data because Intel® Xeon® Platinum 8280M Processor is a confidential SKU and Intel Customer Support cannot give any information about it.

Wanner G.

Intel Customer Support Technician

NickChiu · ‎02-19-2021

Hi Wanner,

Of course, here's the result of Xeon 6148：

Local memory access:

numactl -C 0-19 -m 0 ./bw_mem -P 20 1024M wr

the result is 48912.44 MB/s

the output from PCM-memory monitor:

read: 49026.43 MB/s write: 48929.00 MB/s on socket 0.

Remote memory access:

numactl -C 0-19 -m 1 ./bw_mem -P 20 1024M wr

the result is 23409.51 MB/s

the output from PCM-memory monitor:

read: 23396.72 MB/s write: 46859.81 MB/s on socket 1.

IntelSupport · ‎02-19-2021

Hello NickChiu,

I appreciate your response. I will update this thread as soon as possible.

Wanner G.

Intel Customer Support Technician

IntelSupport · ‎02-24-2021

Hello NickChiu,

I am still looking into your inquiry. I will provide an update by mid-next week.

Thank you for your understanding.

Wanner G.

Intel Customer Support Technician

IntelSupport · ‎03-05-2021

Hello NickChiu,

I would like to provide an update to your inquiry.

Our Engineering team has looked at this and confirmed that this is expected behavior.

We see this with the 1st, 2nd, and 3rd Generation Intel® Xeon® Scalable Processors based on 14nm lithography. However, we do not see this behavior with upcoming Intel Xeon processors using newer lithography.

Thank you for your feedback, and for bringing this to our attention.

Wanner G.

Intel Customer Support Technician

IntelSupport · ‎03-10-2021

Hello NickChiu,

I hope you found the information provided helpful.

If you have any further questions, do not hesitate to update this thread.

Wanner G.

Intel Customer Support Technician

IntelSupport · ‎03-15-2021

Hello NickChiu,

Since I have not heard back from you, I will proceed to close this thread.

Thank you for your understanding.

Wanner G.

Intel Customer Support Technician

NickChiu · ‎04-01-2021

Hi, Wanner

Thanks for your explanation. it's helpful to me.

Sorry for taking such a long time to reply you, i've been stalled by sth else.

Thank you again!