Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

temporary pcie bandwidth drops on Haswell-v3

Friedhelm_S_
Beginner
3,426 Views

Hi All,

we have been developing HD video capture PCIe (Gen2x8) cards, which are installed in HPC servers with Intels Dual-Xeon NUMA architecture. With the SandyBridge-v1/IvyBridge-v2 architecture everything worked fine. Now with the new Haswell-v3 servers we have the following problem:

The video streams (PCIe slot -> RootComplex) start stuttering every few seconds or minutes. When this happens all Tx posted data credits have expired. We observed this situation (all PD credits consumed) already in with the IvyBridge architecture, however, the system recovered quickly from this situation and the temporary bandwidth drop was easily compensated for by the FIFOs in the Tx signal path (no visual degradation in the video streams). This is not the case with the Haswell architecture: sometimes the PD credits are being returned quite slowly – even at times when no new Tx packets are being issued. Typically in this case we observe PD credits being freed up in small steps only:  0 – 4 – 8 – 12 - …  It then takes tens of microseconds until the system has recovered. When everything is working as expected the PD credits are being freed up in much larger chunks. The described behavior is noticeable even on low Tx bandwidths (>= 2.2 GBit/s).

We stripped our software to a minimum to ensure that the data we capture is not processed at all - just transferred to memory via DMA. We double-checked the driver software and also made some tests with different memory allocation methods and DMA transfer setups.
We are using Linux and did the tests with kernel 3.7 (OpenSuse 12.1) and 3.10 (CentOS 7.1). We also tried servers from ASUS and Supermicro.
None of these different test scenarios helps us to get rid of the problem resp. to find a hint whats going on.

Has anyone an idea what the cause of such problems?
Is there a difference between IvyBridge-v2 and Haswell-v3 regarding PCIe credits handling (buffering, flow control)?
Are there tools from Intel helping us to find out what's going regards.

Thanks and kind regards
Friedhelm Schanz
 

0 Kudos
26 Replies
Friedhelm_S_
Beginner
805 Views

Hello again,

finally it turns out that "allocating more LLC cache to IIO" improves the issue for our application dramatically. Adjusting LLC for IIO can be done by a certain MSR register: IIO_LLC_WAYS (0xC8B)

Default is 2 ways. We used 6 ways for our application. The allowed register values are different for Haswell and Broadwell. If someone needs the values, let me know.

What also helped for our application was setting QPI mode to "home snoop" (default on our server had been "early snoop")

best regards

Friedhelm

 

0 Kudos
Jake_P_
Beginner
805 Views

So, I am an enthusiast with basic to intermediate knowledge of CPU and computer architecture. I don't understand how PCIe systems work on a signal or transactional level, but I do know that my computer hits a latecny wall when accessing PCIe devies including my two PCIe NVME devices and my Geforce 1060 3GB.

I know the my Haswell-E 5930K 6 Core CPU is built on the same architecture as you are talking about in this thread. The last eratta update for Haswell E was AUGUST 2014!!! When the chip came out! The Xeons of the same architecture have much more recently updated eratta sheets.

I am having a HELL of a time with Win 10 GUI latency issues. Ive tried 16gb of RAM 8 GB of different RAM, and every other part of the PC except for changing the architecture from Haswell-E. I have had 2 different 5930K's (one old and one BRAND new from Intel RMA) have the same exact problem.

I believe that all statrted when the M/B manufactruesrr released the Broadwell-E compatible BIOS's. Could they have screwed up and changed the IIO_LLC_WAYS (0xC8B) register for both CPU architectures, now causing major issues on my older gen CPU?

I had a X99 Deluxe motherboard and am now on a ASRock Extreme 4 X99. It just feels like all I/O functions are messed up, delayed and it gets worse the longer the PC is on.

Am I just at the point where I need to get a Kaby Lake desktop chip and see if the same problems shows up?

 

PS John McCaplin...I have seen your posts on the Xeon Haswell V3's (X99 consumer) and you are awesome at finding the problems that Intel cannot or will not divulge. I try to buy top of the line for the best support and X99 support has left me wanting AMD.

0 Kudos
gmei2
Beginner
805 Views

Hello Friedhelm, 

Thank you for your sharing on this topic. 

I am facing a similar problem running on Intel Xeon D1541 CPU(8 core), which is Broadwell. 

My card is PCIe Gen 3/Gen2 card with a FPGA on board, which performs signal acquisition (similar to your video capture), and stream to host memory through DMA. this card runs perfectly well on ASUS P9X79 mother board and ASUS X270 WS mother board. 

However, on a Xeon D 1541 mother board, in which the chipset is SoC I believe, the motherboard can only acquire a small amount of data all good; after which, there will be some data lost, then after a while data are good again. the loss of data is interim. 

I would like to try  your means to set the IIO_LLC_WAYS; would you please advice me some more details in setting this. I checked on Xeon D 1541 datasheet(register), there is a register called LLC_WAY_EN but it is read only. I used lspci -s ff:1e.3 -vvv -xxx to view this register and indeed can see the LLC_WAY_EN is set to 1.5M (this CPU  has 8 cores, 12M L3 cache, so each core 1.5M). 

Online search does reveal that changing that register may not have an effect. Please look at item BDE91 on page 34 of this document - https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-d-1500-specification-update.pdf
 

Does Xeon D 1541 has the same register as IIO_LLC_WAYS? Should I use MSR Tools to set this IIO_LLC_WAYS register? which value should I set to? If you can post me a command in doing this, I will greatly appreciate.  

Thank you. 

Friedhelm S. wrote:

Hello again,

finally it turns out that "allocating more LLC cache to IIO" improves the issue for our application dramatically. Adjusting LLC for IIO can be done by a certain MSR register: IIO_LLC_WAYS (0xC8B)

Default is 2 ways. We used 6 ways for our application. The allowed register values are different for Haswell and Broadwell. If someone needs the values, let me know.

What also helped for our application was setting QPI mode to "home snoop" (default on our server had been "early snoop")

best regards

Friedhelm

 

0 Kudos
mostinski__roman
Beginner
805 Views

Hi, 

It seems, something similar has been observed with proprietary FPGA-based PCIe video capture card (ASUS z10pa-u8 motherboard, e5-2640 v4 CPU, Linux 3.16.36). The card being installed in the "south bridge" slot worked fine and the data transfer rate is almost flat.

However the same card behaves weird in the slots connected directly to CPU. The data transfer rate is "spiky", frame drops occur few times a second, multiple NAC registered.

Currently all the BIOS/System configurations are default and we are looking how to debug and fix the issue

Any good idea to try?

 

Thanks and kind regards!

Roman 

0 Kudos
Oblaukhov__Konstanti
805 Views

Hello there,

I had faced same (or very similar) issue on Haswell/Broadwell CPUs (i7-5960X, i7-6950X, Xeon E5-1650v4, Xeon E5-2603v4).

Disabling "Snoop Response Hold Off" feature in BIOS fixed that issue in my case: https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/791126

0 Kudos
Chahal__Karandeep
805 Views

Hello Mr. Friedhelm,

What values did you program in the MSR for IIO_LLC_WAYS for Haswell/Broadwell? Did you have to make any changes  for Skylake?

Thank you,
-K. Chahal

Friedhelm S. wrote:

Hello again,

finally it turns out that "allocating more LLC cache to IIO" improves the issue for our application dramatically. Adjusting LLC for IIO can be done by a certain MSR register: IIO_LLC_WAYS (0xC8B)

Default is 2 ways. We used 6 ways for our application. The allowed register values are different for Haswell and Broadwell. If someone needs the values, let me know.

What also helped for our application was setting QPI mode to "home snoop" (default on our server had been "early snoop")

best regards

Friedhelm

 

0 Kudos
Reply