I have, there is the same

Matthieu_W_ · ‎04-28-2015

I've started to implement a 8086/8088 with the goal of being cycle-exact. I can understand the reasoning behind the number of clock cycles for most instructions, however I must say I'm quite puzzled by the Effective Address (EA) calculation time.

More specifically, why does computing BP + DI or BX + SI take 7 cycles, but computing BP + SI or BX + DI take 8 cycles?

I could just wait for a given number of cycles, but I'm really interested in knowing why there's this 1-cycle difference (and overall why it takes so many cycles to do any EA calculation, since EA uses the ALU for computing addresses, and an ADD between registers is just 3 cycles).

The designers of the chip probably are retired by now, but hopefully there is somebody at Intel who has the knowledge, or can point me to the people who have it :-)

zalia64 · ‎05-05-2015

I suggest you check out the encoding of the instructions. Each additional byte means an additional cycle.

Matthieu_W_ · ‎05-05-2015

I have, there is the same number of bytes in the instruction, whether it's for example BX (5 cycles) or BX + SI (7 cycles) or BX + DI (8 cycles), all of it is encoded using the "mod" + "r/m" fields.

SergeyKostrov · ‎02-11-2016

>>...More specifically, why does computing BP + DI or BX + SI take 7 cycles, but computing BP + SI or BX + DI take 8 cycles? Q1. Are you absolutely confident that your measurements are correct? Q2. How did you get these numbers? ( I mean 7 and 8 cycles )

Matthieu_W_ · ‎02-12-2016

Q1 + Q2. These are not measurements, these numbers come from Intel's own reference manual :-)

I have had the answer since then, the difference has to do with how the effective addressing was implemented with the microcode. So effectively, some modes (such as BP + DI) were more optimized than others (like BX + SI).

Effective Address calculation time on 8086/8088 [old school]