Limiting access on MIC memory size and bandwidth.

Masab_A_ · ‎04-03-2017

Hi all,

I am running workloads on a Xeon Phi 7120P, and had some questions regarding how to disable address interleaving on memory controllers.

Each memory controller on the Phi has two memory channels, and each channel is connected to 1GB of memory (please correct me if i'm wrong). I basically want my program to be able to access only a total of 2GB of memory space from the Phis total space of 16GB. This reduces the memory size and the bandwidth my program will exploit, which is a part of my experiment.

Now one way I was thinking of doing this was to disable address interleaving and keeping my workload working set within 2GB, which would mainly keep it accessing a single memory controller. Another way could be to add dummy data/mapping in between interleaving so all off-chip accesses go to a single memory controller.

Can anyone help me with this?

Thanks!
Masab

McCalpinJohn · ‎04-04-2017

The memory interleaving on the first generation Xeon Phi is extraordinarily complex when ECC is enabled, due to the need to "skip over" 2 out of every 64 cache lines (which are then used to hold the error correcting data).

I have never checked to see what the interleaving looks like with ECC disabled, but I am not aware of any documentation that would provide any help in changing the configuration (if that is even possible).

Reducing bandwidth is much easier in a system with DIMM-based memory, since you can just pull DIMMs to eliminate those sources of bandwidth.

SergeyKostrov · ‎04-06-2017

>>...I basically want my program to be able to access only a total of 2GB of memory space from the Phis total space of 16GB... Could you clarify... Are you going to use MCDRAM or RAM ( DDR4 ) in your experiments?

Masab_A_ · ‎04-06-2017

@John
Thank you very much for your answer. I guess I'll have to play around with compiler based prefetching to somehow approximately control bandwidth.
I was trying to recreate results from this paper: http://delaat.net/awards/2014-03-26-paper.pdf
They claim to disable hardware prefetching, as shown in results from Fig 6 of the paper. They apparently explain how they disable/enable prefetching in Section 3.3, however its all blurry on how they disabled HW prefetching.

@Sergey
I am using a Xeon Phi 7120P, which has an older systems architecture. There is no MCDRAM, and the only memory space available is the 16GB GDDR5.

Charles_C_Intel1 · ‎04-11-2017

In the machines I use, turning off hardware prefetch is done using one of the advanced options in the BIOS when you boot the machine...assuming your BIOS vendor decided to make that option available. The Intel version of the syscfg tool can also read BIOS settings and change them as well (if you have root privileges) on Linux (https://downloadcenter.intel.com/download/26365/Save-and-Restore-System-Configuration-Utility-syscfg-)

Charles

SergeyKostrov · ‎04-12-2017

>>...I was trying to recreate results from this paper: http://delaat.net/awards/2014-03-26-paper.pdf. On a page 6 authors claimed: >>... >>Moreover, both the read and write memory bandwidth increases over the number of threads - which happens because when >>using more threads, we can generate more requests to memory controllers, thus making the interconnect and memory channels busier. >>... Some my tests for older Intel architectures showed that for a 100% memory bound processing a peak of performance is achieved when number of threads used in OpenMP processing does not exceed number of hardware threads for a CPU, or equals to a number of memory channels. Heavy oversubscription of OpenMP threads only degrades processing. For example, for Ivy Bridge Intel architecture ( http://ark.intel.com/Product.aspx?id=70846 ) performance numbers were almost the same for 2 OpenMP threads ( equals to number of memory channels ), and for 4 OpenMP threads ( equals to number of hardware threads ).