We have a software which is reading CPU Timestamp Counter and the CPU Local Apic counter and the NIC I210 timestamp counter. Sometimes we see that there is, all of a sudden, a big time difference between the first two counters (which are both showing synchronous values) and the NIC I210 counter. This gives us the reason to believe that the first two counters are stopped by something for some time. Can anyone give us any hint, what could cause the first two counters to stop counting?
The time difference we see is about 150us. And this is not growing slowly, like a drift, but it seems to come in an instant, all of a sudden.
The system where we see this is an E3845. But we suppose this to be a general behaviour.
Thanks for any help!
Hello, HP :
Thank you for contacting Intel Embedded Community.
In order to be on the same page, we would like to address the following questions:
Could you please tell us if the NIC, the system, and the software related to this situation has been developed by you or by a third party company? In case that they are third products, could you please give us all the information related to them? If it has been developed by you, how many units show this behavior? could you please let us where are stated the guidelines to develop them?
Could you please attach screenshots related to this condition?
Please let us know all the information that may answer the previous questions.
We really appreciate your cooperation.
thank you for your reply.
The NIC is from Intel (I210).
The system is an embedded industrial PC system from NEOUSYS Tech, see next link:
https://www.neousys-tech.com/en/product/processor/intel-atom/poc-200 Bay Trail E3845 Fanless Box PC | POC-200 - Neousys Technology
The software is developed by us, which is Power Automation, see next link:
https://www.powerautomation.de/home/ Power Automation – PC-based CNC Systems for machine tools
This behavior is shown by at least two units, as this is the number where we have a similar testing setup (5 axis with EtherCAT drives applied).
Please see the data below for more information:
Further cycles continue in regular timing.
NIC timer counter and cpu TSC are read in this order directly one behind another. First NIC then next line CPU TSC.
The programmed cycle time is 1ms. The usual jitter in NIC counter < 20 microseconds.
Difference in NIC time of 5th cycle - 4th cyle is 1.57 ms instead 1ms whereas cpu timestamp difference is close to 1ms as expected.
When problem occurs we see always NIC timer delta of 1.5 - 1.75 ms. But always cpu timestamp counter and also APIC timer counter deltas are close to expected 1ms value.
So to us, it seems that the CPU (and with it the TS counter) is stopped somehow for about 0,5 to 0,7ms.
Please feel free to ask if anything is unclear.
Thanks for your clarification.
Based on your previous message, we would like to answer your consultation related to the cited design but we suggest you address your consultations related to this situation as a reference by filling out the https://www.neousys-tech.com/en/support/technical-support Neousys Technology Technical Support form.
We hope that this information may help you.
that's not what I exected. I can try with NEOUSYS, but I feel they will not be able to help me. We're pretty sure, that this is a topic for Intel as the CPU features for power saving (speed stepping) have increased in the last years. We are using Intel CPUs since 20 years for our software and only in the last years, with the upcoming of multi-core CPUs, we got more trouble. Now with the use of EtherCAT, which does not allow much timing jitter, we face this new problem.
Also, this problem just happens one or maximum two times a day. Since we manipulated some BIOS settings, we even have longer run times between the failures. That's why we asked you for a hint, which CPU feature could cause the TS counter / local apic counter to stop.
Do you know the answer to this?
Hello, HP :
Thanks for your update.
We would like to have information related to this third party project, but it should be provided by its manufacturer.
However, we are going to address the following questions related to generic suggestions, because they should be validated and confirmed by the developer of the affected design:
Could please let us know the BIOS version and its manufacturer of the Bay Trail third party module? In case that it is an outdated version, please update it to the latest version, then try to reproduce the issue and let us know the results.
Could you please let us know the Operating System (OS) used on the affected module? In the event, it is unlisted on page 3 of the https://www.intel.com/content/dam/www/public/us/en/documents/platform-briefs/atom-processor-e3800-pl... Platform Brief Intel(R) Atom(TM) Processor E3800 Product Family, please use any of the listed OS to verify if the problem persists and let us know the results.
Could you please let us know how many units are affected by this condition? Could you please verify if the affected processors have the same top side markings? Please provide us pictures of the affected processors, specifically of their topside markings.
Please let us know the information that should answer these questions.
Waiting for your reply.
Concerning the BIOS version:
INSYDE20 Version POC2A004, Build141014
We don't have newer version of the BIOS file. But if it is related to BIOS, then it's more likely related to the BIOS setttings and not to the BIOS version. So which settings could be in the BIOS that could affect the reported behaviour?
We use Windows 7 embedded standard. This is listed in the PDF of the link you sent me. We can not use any other OS of the listed as our SW runs only on Windows 7 embedded standard.
We have currently two sets where we face this behavior. It's not easily possible for us to make pictures of the CPUs, as we should not open the systems we buy. Do you have different behaviour in different production charges? If yes, do you have a list of all product changes of the E3845 CPU?
Hello, HP :
Thanks for your update.
Please keep in mind that we are giving generic solutions since the information that can help you is provided only by the developer of the affected design. Due to this fact, please contact them to obtain the proper information and/or verify our generic suggestions, as we stated on our previous communications.
We recommend updating the BIOS because the latest versions generally have the workarounds to solve some issues. Due to this fact, please contact the BIOS developer to verify if the latest versions have the workarounds to solve the problems in the affected third party design.
On the other hand, we request the markings to verify if the affected processors have the same stepping and/or are from the same lot. However, we suggest you verify with the designer of the affected project if the workarounds of the errata VLI51, VLI55, VLI66, an VLI85 have been implemented in the affected design. This information and more details can be found on pages 28, 29, 31, and 36 of the https://www.intel.ca/content/dam/www/public/us/en/documents/specification-updates/atom-e3800-family-... Intel Atom(R) Processor E3800 Specification Update document # 329901.
Other errata could be related to this inconvenience but it is difficult for us to determine them. It happens because we do not have the information related to the way that this design has been implemented. It makes difficult to determine which ones could be related to this situation.
We are doing our best to help you but it is difficult when the proper information is handled by a third-party company.