Re: P4 stalls for >240 uSec

stevek999 · ‎07-11-2008

Background: We have an in-house realtime OS developed from 386,486,pentium upwards as the available PCIMIG boards change

Problem: We have come to evaluate a P4 board as our previous Celeron board is going EOL. When we try to use the COM2 port at 115K it misses characters. I've tracked it down to the Input instruction sometimes taking over 240 microseconds. Interrupts are disabled since we use this feature to download new software versions serially

Diagnotics: I have eventually ended up with a small piece of test ASM86 code on a Floppy Master Boot Record which times the COM2 status register read using the RDTSC instruction. Interrupts are disabled. This seems to prove that the problem exists. I've then tried booting from this floppy on various desktop type machines (single cores, dual codes, whatever) and most of them seem to exhibit the same symptoms although the time is different. One machine where the problem doesn't occur uses an AMD processor?

I can't believe that this is a problem with the Intel processors as other people using realtime OSs would surely have spotted it as it would affect interrupt latency

Has anyone heard of a problem like this ?

I've found many forums/threads where people talk about cache latency and such but 240+ MICROSECONDS seems a bit much

Thanks

SHIH_K_Intel · ‎07-14-2008

I'm not sure I agree with some of the analysis you described. I can only offer some observations that may be useful in drilling down the cause of your symptoms.

I/O instructions do have substantially longer latency, the exact number may vary with a number of factors, ranging from how the measurement code is written, how and what environmental characteristics the measure code is run, the device characteristics the code is talking to. There are many commercial software that employs COM port for communication to external peripherals and PCs, Today, modem-based Internet access is still actively being used with 56K baud modems. Your claim of Input/output instruction taking 240 microsecond would imply 56K baud modems will also be dropping bits left and right at 30K baud? That seems to be not very credible.

Some other factors you might consider:
If you try to execute a test program from MBR, the execution environment probably have timing that differs dramatically from normal applications. P4 with its microarchitecture transition from previous generations of microarchitecture might have exposed certain subtle timing assumption made in you test code and/or OS that does not align with the partially-initialized execution environment that your MBR-based program does. In common mainstream environment, application's execution environment are initialized by BIOS and various OS components, so that the common case are: caches are enabled,program code are prefetched from cacheable memory to instruction cache, and the front end can fetch instruction stream to the decoders without starving the back-end; at the same time most data fetches for small working set will most likely be found in the cache hierarchy. For example, if caches were not enabled, your tests may be further affected by different characteristics of the P4 bus relative to previous generations.

My hunch is that the execution environment your test code executesmay bevery different from that common case described above, thus subtle timing assumption may result in surprises. It looks like the variability of your results suggests timing assumption may be the root of your problem.

stevek999 · ‎07-15-2008

sjkuo,

I neglected to mention that our system is used for real-time process control where a random delay anywhere in the system can be critical. This is why I've spent 2 weeksdrilling down to find and confirm this problem before I posted

Your point about COM ports is valid but I assumed that most people wouldn't see a problem as it would be masked by the UARTs FIFO. Admittedly our legacy code doesn't open the FIFO but at 83 microseconds per byte it shouldn't need to

For your second pointremember that the MBR runs after the BIOS has completed its initialisations and since I set the BIOS to optimised defaults that includes opening the processor cache

Here's the loop code from my MBR

; TEST CODE
;$$$$$$$$$$$$$$$$$$

CLI; DISABLE INT'S

MOVEAX,0
DECEAX
PUSHEAX; LOOP COUNTER

;LOOP
;----

L001:

;CHECK IF TIME TO OUTPUT A '-'
;-----------------------------

POPEAX
INCEAX
PUSHEAX
ANDEAX,0FFFFFFH
JNZL002

MOVAL,'-'
MOVAH,0EH
MOVBX,00007H
INT10H

L002:

;EXECUTE 'CPUID' INSTRUCTION TO ENSURE PREVIOUS
; INSTRUCTIONS ARE ALL COMPLETE
;----------------------------------------------

MOVEAX,0
;CPUID
DB00FH,0A2H

;GET THE CPU CLOCK COUNTER VALUE IN EDX:EAX
;------------------------------------------

;RDTSC
DB00FH,031H

;SAVE ON STACK
;-------------

PUSHEAX
PUSHEDX

;GET COM2 STATUS (TEST INSTRUCTION)
;----------------------------------

MOVEDX,2FDH
INAL,DX

;EXECUTE 'CPUID' INSTRUCTION TO ENSURE PREVIOUS
; INSTRUCTIONS ARE ALL COMPLETE
;----------------------------------------------

MOVEAX,0
;CPUID
DB00FH,0A2H

;GET THE CPU CLOCK COUNTER VALUE IN EDX:EAX
;------------------------------------------

;RDTSC
DB00FH,031H

;RESTORE START CPU CLOCK COUNTER VALUE IN ECX:EBX
;------------------------------------------------

POPECX
POPEBX

;CALCULATE NETT CPU CLOCK COUNT
;------------------------------

SUBEAX,EBX
SBBEDX,ECX

;CHECK IF COUNT IS EXCESSIVE OR FIRST TIME
;20000 = 7 MICROSECONDS FOR 3.0 GHZ PROCESSOR
; OR 13 MICROSECONDS FOR 1.5 GHZ PROCESSOR
; OR 20 MICROSECONDS FOR 1.0 GHZ PROCESSOR
;---------------------------------------------

POPECX
ORECX,ECX
PUSHECX
JZL010

CMPEAX,20000
JBL001

L010:

;OUTPUT EAX AS DECIMAL
;---------------------

.....

This code was written to prove a point it is not meant to be real code. This was only writtten after the problem had manifested itself in other areas

Finally. Your comment aboutsubtle timing assumption may result in surprises. A delay of a couple of microseconds on a 2.8 GHz processor is a surprise. An apparent random delay of 100s of microseconds in a realtime environment is not a surprise, its a shock

stevek999 · ‎07-23-2008

After anothers weeks digging....

1) the problem is not restricted to an input instruction. I think that the reason I thought it did was because it originally showed up using a COM port and also that in any loop an I/O instruction takes the longest time and therefore is exposed to the problem for a larger proportion of the loop time

2) The problem is demonstrable on an old Dell Dimension 4400 from 2002 upto an Intel Core 2 Duo (E6550) from 2007

3) The stall time varies from PC model toPC model but 2 examples ofone model I tried had the same value

4) The stall occurs at multiples ofsome interval. I'm still working on timing that interval, at the moment I use a loop counter so obviously the count is different for each PC

5) I have found a way of removing the problem on 4 different PCs. Many PC BIOSes allow you to disable the USB and/or the USB legacy mode. I've found that disabling the USB legacy mode works. Or on those PCs without that option, disabling USB completely, works. Unfortunately the only board where this doesn't fix the problem completely is the PICMIG board that I am evaluating. Disabling the USB only reduces the problem

Does anyone know of any published errata that may cover this 'feature' of the P4 and its chipsets as I could code a workaround into our realtime O.S if I knew what it was

BTW our realtime O.S doesn't use USB so disabling it is acceptable to us

jimdempseyatthecove · ‎07-24-2008

Steve,

Let me preface this first by saying I am not a motherboard design engineer. That said, if you discovered disabling the USB cures or improves the situation, and since your code is running with interrupts off, I would venture to guess that the USB support is implemented with a non-maskable interrupt technique. You could confirm this by hooking into the NMI vector and see if one occures prior to passing onto the prior vectors code. If this confirms that the USB is somehow using an NMI, then maybe you can excise the USB interrupt from the board by cutting a trace (install a jumper to restore if you wish).

Jim Dempsey

jimdempseyatthecove · ‎07-24-2008

Also,

Would an incorrect DRAM setting be messing you up? e.g. refresh taking inordinately long?

Jim

levicki · ‎08-05-2008

Good points Jim.