Embedded Intel® Core™ Processors
Communicate Intel® Core™ Hardware, Software, Firmware, Graphics Concerns
1270 Discussions

6th Gen Skylake USB Host Controller (PCH) Bulk Transfer Issue

JWhit30
Novice
9,531 Views

Summary: USB endpoints based around the USB 2.0 highspeed Cypress FX2LP (http://www.cypress.com/products/ez-usb-fx2lp EZ-USB FX2LP™ | Cypress) using bulk transfers from endpoint to Intel PCH results in FX2LP buffer overflows due to insufficient polling rate by PCH. Previous generation of Intel processors and also "less powerful" solutions (such as Broadcom SoC based host controllers) did not present this issue. Product is existing in the marketplace.

More Detail: USB endpoint is a data streaming device. The expectation is that the data will not be lost as even "small" 16bit microcontroller based USB hosts have been shown to consume the data fast enough to prevent data loss. The theory is that some power/thermal/management/etc process is preventing PCH from polling fast enough to keep up with the required data rate (approx. 20MB/s). I do have Lecroy USB protocol analyzer traces available which demonstrate a request with lost data and the ability to provide more traces if needed.

Our endpoint uses a "bundled" solution with the Microsoft Surface + endpoint. The issue was first found when the (Skylake based) Surface Pro 4 was tested (Surface Pro 3 - which is based on Gen5 Intel is fine). A second Skylake solution, the Lenovo P70 (Xeon), was tested and found to have the same issue. Otherwise, the solution has been tested on everything from SoC/ARM/previous gen Intel and similar failures did not occur. Therefore, the issue has been identified as specific to Skylake.

Over time, the ability to reproduce the issue has gone from trivial (failures "left and right") in earlier firmware to more difficult. Likely some changes in the SP4 firmware/BIOS/OS/etc have improved performance. The Lenovo is not refreshed as often so some regression testing may be called for to see if it still more easily demonstrated to fail.

The request is for some guided debug which allows for shutting off throttling/state changes/etc. which are unique to Skylake which are affecting the polling frequency of USB. Since my solution is the endpoint and not the host controller, I do not have XDP access on the host side in order to force register changes before the BIOS takes control.

Note that "bonehead" issues have been eliminated as we have been working this issue for several months also working with Cypress. The channel is good (not an SI issue) and the problem has been proven to data not extracted fast enough by the host controller resulting in data loss rather than any other means. Using known test patterns sent by our endpoint, we can identify which data is lost.

0 Kudos
34 Replies
JWhit30
Novice
1,734 Views

Intel,

As an update, we were able to reproduce the failure using the Intel NUC (NUC6I5SYH) using Windows 7. Initially, the failure did not occur. After installing the latest Chipset drivers for NUC/Win7, the failure still did not occur. I had updated a few other drivers at the same time as the chipset drivers - but not video (GFX* driver) or ethernet (LAN* driver). After installing the video and other drivers (ethernet, possibly audio, maybe others) - the failure did occur.

It is possible that CamHW tested with a "clean" install of Windows 7 which is why no issue was found - or perhaps the DUT is different enough that the results will vary.

Also, for Linux - I would say that the results are inconclusive. Installing Ubuntu without trying to update all drivers did not result in a failure. However, in testing it is not advisable to spend too much time on what does not fail in the first pass. We'll learn more with Win7/Win8.1 and circle back to Linux.

The plan is to isolate updates starting from a clean install to see if we can pinpoint which driver install started to make the failures easier to produce (more like the frequency in Win10).

At the least, we have established a failure signature using an Intel hardware platform (MLB = NUC) taking Microsoft and Lenovo out of the loop in terms of BIOS support. We have also demonstrated more versions of Microsoft Windows show the failure.

Here is a full list of post-windows7-install drivers I have attempted to apply (I believe KB947821 is not complete and did not install - also, I do not think I installed SerialIO*):

06/03/2016 08:09 PM 2,097,152 Windows6.1-KB947821-v34-x86.msu

06/03/2016 08:11 PM 2,831,700 Chipset_Win7_8.1_10_10.1.1.18.zip

06/03/2016 08:11 PM 214,271,604 GFX_Win7_8.1_10_64_15.40.24.4454.zip

06/03/2016 08:15 PM 106,360,936 Wireless_18.40.0_PROSet64_Win7.exe

06/03/2016 08:13 PM 41,272,992 LAN_Win7_64_20.7.1.exe

06/03/2016 08:13 PM 37,438,152 BT_18.16.1_64_Win7.exe

06/03/2016 08:15 PM 258,989,031 AUD_Win7_8.1_10_6.0.1.7730.zip

06/03/2016 08:15 PM 20,295,656 iRMT_Win7_8.1_10_64_1.1.70.520.zip

06/03/2016 08:15 PM 3,154,233 SerialIO_Win8.1_10_64_30.63.1603.05.zip

0 Kudos
JWhit30
Novice
1,734 Views

As an update, our test engineer has reported the following:

"I did a clean install of Windows 7 and installed each system driver one by one. On each integration of the drivers I tested [our DUT]. As soon as I loaded Intel's Iris 540 graphics drivers Dbmon instantly displayed packet loss along with huge lag spikes when displaying the image.

I further investigated this theory by removing the GFX drivers on the Surface Pro 4 . This proved successful in eliminating the packet loss."

It seems that the display driver in combination with the FX2LP causes problems. We can try Win 8.1, but it seems we have a reasonable set of book-ends at the moment (Win 7 and Win 10 - both behaving the same with respect to Intel Video Drivers + FX2LP causing issue and removing Intel Video drivers fixing).

Since we have taken the Surface Pro 4 out of the hardware loop by using the NUC - this should "fast track" debug on Intel's side since the suspect drivers and hardware platform running is all Intel manufactured.

0 Kudos
Adolfo_S_Intel
Moderator
1,734 Views

Hello JasonHWDesign

Thanks for your updates.

I have sent you an e-mail with the link to the Intel Premier Support channel, it would be better to post the case there as it would have more priority.

Also could you please indicate the NUC model that you are using?

Best Regards,

Adolfo

0 Kudos
Adolfo_S_Intel
Moderator
1,734 Views

Hello JasonHWDesign

Does the issue occurs on testing on Safe Mode?

Best Regards,

Adolfo Sanchez

0 Kudos
JWhit30
Novice
1,734 Views

Let me know if there is a list of tests Intel would like completed. As the problem reproduces in Win10 and Win7, is Win 8.1 testing still necessary?

We can certainly test the safe mode request under Win7/Win10.

Any other tests I would like to queue up if there are any next steps.

0 Kudos
JWhit30
Novice
1,734 Views

Testing the NUC with Windows 7 Safe Mode, the issue does not reproduce. Likely due to safe mode using legacy video.

0 Kudos
JWhit30
Novice
1,734 Views

Intel / AdolfoS ,

Any progress in getting the issue worked through the NUC team? I continue to attempt to clear the IT issue @ Intel so I can file a Premier Support ticket - but this is stalled on Intel's end (at least going on a week as of this message).

0 Kudos
ALeva
Beginner
1,734 Views

Hello!

We have the same problems on Intel Host Controllers under Windows 8.1 and 10. We have a digital camera which transfer data over bulk endpoint.

Our device also included USB-COM device (CP2102, USB 2.0, Full Speed device) connected via hub. We start transfer data from camera, everything works good, but when we just open USB-COM device we can't get good throughput from camera.

We have no problem under Windows 7 or using another host controller (Renesas)

Is there any solution?

0 Kudos
Adolfo_S_Intel
Moderator
1,734 Views

Hello leva

At the moment the issue seems to be related to the Intel Graphic Driver, can you confirm that your issue disappears when you uninstall the Intel Graphic Drivers?

Best Regards,

Adolfo Sanchez.

0 Kudos
ALeva
Beginner
1,734 Views

Hello!

Unfortunately, this doesn't help us. I found out that disabling idle power state (http://stackoverflow.com/questions/9721218/trying-to-disable-processor-idle-states-c-states-on-windows-pc c++ - Trying to disable Processor idle states (C states) on Windows PC - Stack Overflow ) helped to increase stability of getting frames over USB. But it isn't completely solve the problem.

When we connect USB-COM port we get the following:

Is there any way to increase priority for usb device?

Best regards,

Alex

0 Kudos
JWhit30
Novice
1,734 Views

The issue behavior seems to be affected by the graphic driver - but the driver may not be a cause. Lack of the driver may increase CPU core utilization (which, increased CPU utilization has also been found to positively affect the issue behavior meaning the issue occurs less). The root cause may be C-state related. Or something else entirely. In order to drill down to root-cause, we would need Intel to "pop the hood" and take a look at the system using an XDP (which we as a customer do not have), USB protocol analyzer (which we do have), and BIOS developer and/or other Intel internal tools (which we do not have).

Generating the conditions to cause a failure is very simple - and the failures occur readily (within seconds). The hardware required to reproduce the problem, with exception of the USB peripheral, is all Intel - including the motherboard/BIOS. Therefore, debug is straight-forward using (Intel) native tools and in-house resources. The USB endpoint can be purchased off-the-shelf as released product demonstrates the issue.

Due to a number of Intel internal IT issues - I am not able to submit a support ticket to move forward. Hopefully this will be remedied in time for the affected customers (both companies such as ours producing hardware with Skylake-related issues and also end-customers using affected hardware).

====

Commenting on the 3rd case (Leva's) - it may be the same, related and not the same, or a different case of failure entirely. There is focus on a USB-to-serial bridge, but it's not clear what data stream is having the errant behavior (serial data, frame data, or both).

I assume the image data is served by a different USB device (not serial) in order to provide something faster than the serial UART's max throughput. It would be interesting to learn if this part serving the frame data is related (the Cypress part) or a different part.

0 Kudos
Adolfo_S_Intel
Moderator
1,734 Views

Hello JasonHWDesign

After consulting since this is a case that affects several Skylake platforms it is unlikely that the NUC team can provide additional input.

The information related to this sighting has been now to the Engineers in charge of the product so they are aware of the issue.

Please keep working with the TAC team to get access to the Intel Premier Support Channel.

Best Regards,

Adolfo Sanchez.

0 Kudos
Adolfo_S_Intel
Moderator
1,734 Views

Hello leva

I'm not sure if the issue you are reporting is related to the issue exposed here.

On what hardware platform are you testing your application?

Is it an Skylake processor?

Please test on a different platform? Haswell, Broadwell, Ivy Bridge, etc?

Best Regards,

Adolfo Sanchez

0 Kudos
ALeva
Beginner
1,734 Views

Hello!

We have the similar results on platform based on following chipsets - Intel Z77, H87, Z97, Z170 and Intel® NUC D54250WYK under Windows 8 and Windows 10 (I don't remember exact, but I suppose it was on Haswell and Broadwell CPUs).

And we have a problems on Windows 7 on Intel Z170 and Skylake CPU.

We are going to buy USB protocol analyzer, so I will send logs when I obtain the analyzer.

Best regards,

Aliaksandr Levanovich

0 Kudos
Reply