Processors
Intel® Processors, Tools, and Utilities
14503 Discussions

Are there any ways that can control prefetcher flexibly?

oleotiger
Novice
1,166 Views

I'm working with Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz.

There are some knobs that can control the prefetcher behavior in BIOS control panel. The prefetchers can be controlled with register written as well.

But the way to control prefetchers is only switching on/off.

Prefetching stragedy can be determined flexibly with two variables: prefetching distance and degree. For prefetching algorithm in intel cpu, the algorithm may be more complicated. But I think there are still some paremeters which determines whether to prefetch aggresively or passively.

 

Can someone provide some ways, such as register written, that can control prefetching behavior flexibly on intel 6248R CPU instead of switching on/off?

0 Kudos
14 Replies
DeividA_Intel
Employee
1,150 Views

Hello oleotiger,  

  


Thank you for posting on the Intel® communities.  


  


We would like to inform you that we have a forum for these specific issues and products, so we are moving it to the appropriate forum so you can get better support for this matter.  


  


Regards,  

  

Deivid A.  

Intel Customer Support Technician  


0 Kudos
IntelSupport
Moderator
1,141 Views

Hello oleotiger,


Thank you for posting in the Intel Community.


We understand that you are looking for ways to control the prefetcher flexibly of the Intel® Xeon®, to have more details about your environment and project please let us know the following details:


  • What is the model of the server/workstation that you are using with the Intel® Xeon®?
  • Are you having any issues with the system?
  • Do you have any particular purpose or project that requires you to modify the prefetcher flexibly?
  • Are you working on system development?


Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
oleotiger
Novice
1,137 Views
  1. There are 8 servers as a cluster with the Intel® Xeon®  to run HPC application, such as WRF with both intelmpi and openmpi.
  2. There is no issue with the system.
  3. Yes. With vtune, I found that there is high L3 miss rate (about 70%~80%). In the meanwhile, the application is typically memory bound and memory bandwidth is the bottleneck I think.As inefficient prefetching can introduce much overhead to the memory bandwidth and reduce the performance. I wanna to try if adjusting the prefetching policy could improve the performance.I would like to try both aggressive-->passive policy and different prefetching algorithm (According to HPC application, there are many indirectly memory access pattern e.g. a[b[c]] and the prefetching pattern is much different from common applications.).In conclusion, I want to find a suited prefetching algorithm and appropriate policy(not too passive nor too aggressive) to improve the performance of HPC application. But I didn't find a way to control it flexibly?BTW : I tried turning on/off various prefetcher of SoC(e.g. hardware prefetcher, steame prefetcher), keeping perfetcher on achieves better result. 
  4. No, I'm working on performance tuning.
0 Kudos
IntelSupport
Moderator
1,111 Views

Hello oleotiger,


Thank you for the details about your interest in this matter. Let us look into this question; we are going to reach you back as soon as possible.


Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
IntelSupport
Moderator
1,089 Views

Hello oleotiger,


Thank you for waiting.


We would like to inform you that the prefetching setting as you have found can be switch on/off;  it doesn't have a set of controls to increase/decrease prefetching distance or aggressiveness.


Thank you for your feedback about the prefetching setting/performance.


Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
IntelSupport
Moderator
1,063 Views

Hello oleotiger,

 

I am checking on your thread to know if you need further assistance with this request.

 

Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
oleotiger
Novice
1,054 Views

It's really a pity that I cannot control prefetchers flexibly.

 

If there is much inefficient prefetching ( there must be a certain percentage of prefetching that is efficient), it results in overhead in memory bandwidth.  I think just turning prefetchers on or off both can not  achieve the best performance. 

I believe there must be a tradeoff that we can get memory bandwidth overhead and prefetching efficiency balanced.

 

Does intel do it automically? How could users get involed with the prefetching policy?

 

 

0 Kudos
IntelSupport
Moderator
1,044 Views

Hello oleotiger,


Thank you for your response. let us look into your request; we will be posting back as soon as possible


Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
IntelSupport
Moderator
1,028 Views

Hello oleotiger, 


We would like to share with your that we are still working on your request. We are going to update this question as soon as we have any updates.


Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
IntelSupport
Moderator
1,015 Views

Hello oleotiger, 


Thank you for waiting.


As mentioned earlier there aren't options to control prefetcher flexibly. In reference to the memory bandwidth performance we have the following questions:


  • Are you using a 2-socket or 4-socket platform?
  • Did you see better memory bandwidth performance with older Xeon processors such as Haswell or Broadwell?
  • Does your application use non-temporal writes (NTWs)? If so, depending on the version of the software being used, there are some updates to glibc in version 2.17 & 2.18 or later.
  • The server's BIOS may have exposed some BIOS options that can influence cache traffic and latencies. We suggest checking your BIOS for the following options:
    • Snoop Mode
    • LLC (last level cache) Prefetch
    • Dead Line LLC allocation (or LLC deadline alloc)
    • Directory AtoS (or Stale AtoS)
    • DBP-for-F



Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
oleotiger
Novice
971 Views
  • Are you using a 2-socket or 4-socket platform?
    • 2-Socket Platform
  • Did you see better memory bandwidth performance with older Xeon processors such as Haswell or Broadwell?
    • I don't have processors such as Haswell or Broadwell . So I can't give you the conclusion.
  • Does your application use non-temporal writes (NTWs)? If so, depending on the version of the software being used, there are some updates to glibc in version 2.17 & 2.18 or later.
    • No, I think the latest glibc won't help in performance.
  • The server's BIOS may have exposed some BIOS options that can influence cache traffic and latencies. We suggest checking your BIOS for the following options:
    • Snoop Mode
    • LLC (last level cache) Prefetch
    • Dead Line LLC allocation (or LLC deadline alloc)
    • Directory AtoS (or Stale AtoS)
    • DBP-for-F

This may help. I will try these knobs in BIOS. 

0 Kudos
IntelSupport
Moderator
961 Views

Hello oleotiger,


Thank you for your response. please let us know the outcome of the testing in the BIOS setting.


Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
IntelSupport
Moderator
923 Views

Hello oleotiger,

 

I am checking on your thread to know if you need further assistance with this request.

 

Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
IntelSupport
Moderator
906 Views

Hello oleotiger


We have not heard back from you, so we will close this inquiry. If you need further assistance, please post a new question.


Regards,

Leonardo C.


Intel Customer Support Technician


0 Kudos
Reply