Community
cancel
Showing results for 
Search instead for 
Did you mean: 
NickChiu
Beginner
298 Views

Why do a store stream generate double bandwidth across the UPI link?

Hi, all~

I'm testing my Xeon 8280M 2-socket system which enables 2 UPI links by using Lmbench bw-mem.

First a local memory access:

      numactl -C 0-27 -m 0 ./bw_mem -P 28 1024M wr

      the result is 50578.27 MB/s

the output from PCM-memory monitor:

      read: 50736.01 MB/s  write: 50530.79 MB/s  on socket 0. This is just as expected.

Then a remote memory access:

      numactl -C 0-27 -m 1 ./bw_mem -P 28 1024M wr

      the result is 30226.59 MB/s

the output from PCM-memory monitor:

      read: 30666.36 MB/s  write: 60385.72 MB/s  on socket 1.

Now it confuses me. The write stream bandwidth is doubled on socket 1. I tested another Xeon 6148 platform, the result was the same. Where do these extra write stream come from?

0 Kudos
13 Replies
IntelSupport
Community Manager
288 Views

Hello NickChiu,


Thank you for posting your question on this Intel® Community.


To better assist you, please provide us with the following information about your environment:

  • System model:
  • Are you currently developing an application on this system? If possible, please provide more details about this.
  • Have you tried using other benchmarks or tools to compare this behavior? If possible, please provide screenshots.


Wanner G.

Intel Customer Support Technician


NickChiu
Beginner
275 Views

Hello Wanner,

Thank you for replying!

1.My system model is Huawei 2288H V5 with dual Intel Xeon Platinum 8280 processors

2.No, it's a clean environment. I'm pretty sure there's no weird application would cause this behavior, if this is what you concern about.

3.Yes, i tried STREAM too. it's another system bandwidth benchmark. 

    The code is quiet simple:

            while(1){
                 for (j=0; j<stream_array_size; j++)     //stream_array_size is large enough to access memory 
                 c[j] = a[j];       
            }

    Again, 

    when run with  "numactl -C 0 -m 0 ./stream":

    output from pcm-memory is:      read 4795.40 MB/s   write 2375.41 MB/s (this is exactly what is expected)

    when run with  "numactl -C 0 -m 1 ./stream":

    output from pcm-memory is:      read 4663.65 MB/s   write 4544.18 MB/s (write bandwidth doubled too)

    For some reason, i'm not able to provide screenshots, sorry about that... it would be appreciated if you could reproduce this result on your environment ^_^, all the benchmarks i used are open-source on github. Thank you for your help!

IntelSupport
Community Manager
266 Views

Hello NickChiu,


I appreciate your response.


On your initial post, you stated that you were using an Intel® Xeon® Platinum 8280M Processor.


Please let us know if you are using Intel® Xeon® Platinum 8280 Processor or Intel® Xeon® Platinum 8280M Processor.


Wanner G.

Intel Customer Support Technician


NickChiu
Beginner
262 Views

Hi Wanner,

it's 8280M, sorry about missing the suffix... 

IntelSupport
Community Manager
258 Views

Hello NickChiu,


I appreciate your response.


I will update this thread soon.


Wanner G.

Intel Customer Support Technician


IntelSupport
Community Manager
235 Views

Hello NickChiu,


We are still looking into your request.


By any chance, are you able to confirm the use of the system and company name?


Wanner G.

Intel Customer Support Technician


NickChiu
Beginner
228 Views

Hi, Wanner
Sure, what do I need to do?
IntelSupport
Community Manager
226 Views

Hello NickChiu,


I have sent you a message to the email address associated with your profile.


Please reply to this message at your earliest convenience.


Wanner G.

Intel Customer Support Technician


IntelSupport
Community Manager
152 Views

Hello NickChiu,


I am still investigating your inquiry and will provide an update by mid-next week.


  • Are you able to provide your output results data using the Intel® Xeon® Gold 6148 Processor? 


I am trying to obtain additional output results data because Intel® Xeon® Platinum 8280M Processor is a confidential SKU and Intel Customer Support cannot give any information about it.


Wanner G.

Intel Customer Support Technician


NickChiu
Beginner
140 Views

Hi Wanner,

  Of course, here's the result of Xeon 6148:

  Local memory access:

       numactl -C 0-19 -m 0 ./bw_mem -P 20 1024M wr

       the result is 48912.44 MB/s

       the output from PCM-memory monitor:

              read: 49026.43 MB/s  write: 48929.00 MB/s  on socket 0. 

  Remote memory access:

       numactl -C 0-19 -m 1 ./bw_mem -P 20 1024M wr

       the result is 23409.51 MB/s

       the output from PCM-memory monitor:

              read: 23396.72 MB/s  write: 46859.81 MB/s  on socket 1. 

IntelSupport
Community Manager
138 Views

Hello NickChiu,


I appreciate your response. I will update this thread as soon as possible.


Wanner G.

Intel Customer Support Technician


IntelSupport
Community Manager
99 Views

Hello NickChiu,


I am still looking into your inquiry. I will provide an update by mid-next week.


Thank you for your understanding.


Wanner G.

Intel Customer Support Technician


IntelSupport
Community Manager
30 Views

Hello NickChiu,


I would like to provide an update to your inquiry.


Our Engineering team has looked at this and confirmed that this is expected behavior.


We see this with the 1st, 2nd, and 3rd Generation Intel® Xeon® Scalable Processors based on 14nm lithography. However, we do not see this behavior with upcoming Intel Xeon processors using newer lithography.


Thank you for your feedback, and for bringing this to our attention.


Wanner G.

Intel Customer Support Technician