I would like to ask you about detailed information on instruction decoding the way processor does it. I know all about several fields of an instruction and how they interact with each other. However, I would like to know how the processor does the job by itself (I'm sure it must be an efficent algorithm somehow depend on certain bits of the instruction). For example: Its easy to get the prefix and the opcode itself, but its not as clear with the ModR/M byte. In your manual we can read that opcode defines the existance of ModR/M byte. I know that similar behavior is about ModR/M and SIB (if we have MRM we can calculate whether SIB exists or not). Thats the point of my question - is there a way to guess if there is a ModR/M byte following the opcode without using pre-defined table of opcodes that uses it or not? Same think about displacement and immediate - I know when those fields are used but how can it be done most efficently? (as the decoding unit does it itself, I guess) Please provide me with this information along with detailed description of algorithm used by the decoding unit (if possible). Thank you.
I've done something like that several years ago: x86 Decoder. I literally took a piece of squared paper, and in a 16x16 matrix representing each opcode I colored each square for opcodes that require a specific decoding. This revealed patterns that make it relatively easy to use binary logic to determine the decoding format of each instruction.
I believe my code is smaller than using tables, but not entirely optimal yet. Also, when translated to hardware each of the tests really just looks at a few bits so it could be quite compact.
Here's a document that reveals some details: An Asynchronous Instruction Length Decoder. There's also tons of information in patents if you really want to get to the bottom of it...
But since you mentioned working on a disassembler you might also be interested in this: Efficient Software Decoder Design.