Re: question on avx instruction encoding

tthsqe · ‎01-01-2010

so any instruction that can be encoded with a two byte vex prefix can be encoded with a three byte prefix. How about the other way around?
Is it true thatthat the two byte prefix may be usedif and only if
vex.W = 0
vex.X = 1
vex.B = 1
vex.mmmmm = 00001
?
I.E. VFMADD132PS may use the two byte form but VFMADD132PD needs the three byte form?

MarkC_Intel · ‎01-01-2010

Quoting - tthsqe

so any instruction that can be encoded with a two byte vex prefix can be encoded with a three byte prefix. How about the other way around?
Is it true thatthat the two byte prefix may be usedif and only if
vex.W = 0
vex.X = 1
vex.B = 1
vex.mmmmm = 00001
?
I.E. VFMADD132PS may use the two byte form but VFMADD132PD needs the three byte form?

Hi,
The 3-byte VEX sequence (starting with C4) must be used when one needs to set VEX.W=1, VEX.X=0, VEX.B=0 or the opcodes are in map 0F3A or 0F38. The X and B bits are logically inverted bits relative to their meaning in the REX prefix. The 2-byte VEX sequence (staring with C5) can be used when the opcodes are in the 0F map and do not require these other bit settings.

For your 2nd question, all the VFMADD* instructions are in map 0F38, so they all must use the 3-byte (C4) VEX sequence.

An example of something that could use C4 or C5 is VADDPS. It is in map 0F. But which prefix sequence is required depends on the registers used. Here are two examples using the XED from the Intel SDE kit. In the first example, because the 3rd operand uses YMM13 and the operand is encoded in the MODRM.RM field, it uses the VEX.B'=0 to encode the upper bit of its register identifer. (REX.B would be 1 so VEX.B'=0). In the second example, since the 3rd operand is YMM3, we don't need to specify VEX.B'=0, and the shorter 2-byte C5 sequence can be used.

% kit/xed -64 -e vaddps ymm0 ymm1 ymm13
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM13, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C4C17458C5
.byte 0xc4,0xc1,0x74,0x58,0xc5

% kit/xed -64 -e vaddps ymm0 ymm1 ymm3
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM3, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C5F458C3
.byte 0xc5,0xf4,0x58,0xc3

tthsqe · ‎01-01-2010

Quoting - Mark Charney (Intel)

Hi,
The 3-byte VEX sequence (starting with C4) must be used when one needs to set VEX.W=1, VEX.X=0, VEX.B=0 or the opcodes are in map 0F3A or 0F38. The X and B bits are logically inverted bits relative to their meaning in the REX prefix. The 2-byte VEX sequence (staring with C5) can be used when the opcodes are in the 0F map and do not require these other bit settings.

For your 2nd question, all the VFMADD* instructions are in map 0F38, so they all must use the 3-byte (C4) VEX sequence.

An example of something that could use C4 or C5 is VADDPS. It is in map 0F. But which prefix sequence is required depends on the registers used. Here are two examples using the XED from the Intel SDE kit. In the first example, because the 3rd operand uses YMM13 and the operand is encoded in the MODRM.RM field, it uses the VEX.B'=0 to encode the upper bit of its register identifer. (REX.B would be 1 so VEX.B'=0). In the second example, since the 3rd operand is YMM3, we don't need to specify VEX.B'=0, and the shorter 2-byte C5 sequence can be used.

% kit/xed -64 -e vaddps ymm0 ymm1 ymm13
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM13, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C4C17458C5
.byte 0xc4,0xc1,0x74,0x58,0xc5

% kit/xed -64 -e vaddps ymm0 ymm1 ymm3
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM3, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C5F458C3
.byte 0xc5,0xf4,0x58,0xc3

oh, sorry. I over looked that fmadd doesn't have the 0f opcode. I might try that sde. Thanks.

minipli41 · ‎07-06-2011

Quoting Mark Charney (Intel)

% kit/xed -64 -e vaddps ymm0 ymm1 ymm13
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM13, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C4C17458C5
.byte 0xc4,0xc1,0x74,0x58,0xc5

I'm a little confused. Why is the second byte 0xc1? This would imply VEX.X = 1 but shouldn't it be 0 because REX.X for YMM13 would be 1?

And even more confusing:

% printf '\\xc4\\xc1\\x74\\x58\\xc5' > vaddps.bin
% printf '\\xc4\\x81\\x74\\x58\\xc5' >> vaddps.bin
% ./xed -64 -ir vaddps.bin
In raw...
XDIS 0: AVX       AVX   C4C17458C5               vaddps ymm0, ymm1, ymm13
XDIS 5: AVX       AVX   C4817458C5               vaddps ymm0, ymm1, ymm13

...

How can those two byte sequences decode to the same opcode? Shouldn't the former have REG2:YMM2 and only the latter REG2:YMM15?

Regards,

Mathias

MarkC_Intel · ‎07-06-2011

C4 C1 74 58 C5 and C4 81 74 58 C5 differ in the VEX.X bit. The VEX.X bit is not used in the encoding of these forms of these instructions. VEX.X is typically used to extend the index-register operand if there is one.

MarkC_Intel · ‎07-06-2011

Oh yeah, and in answer to the first part of your original question: VEX.X is stored inverted. (As are VEX.R and VEX.B). The reason for that has to do with how we re-used the LDS/LES instructions in 32b mode.

minipli41 · ‎07-07-2011

Thanks. I think I got it. Since the VEX prefix allows to encode a full YMM register without any further extension bits in the vvvv field the VEX.{R,B,X} bits are used for possible registers 2 to 4, right?

MarkC_Intel · ‎07-07-2011

Hi. Not sure what you mean by "registers 2 to 4". Each instruction description now has a box on the instruction page that specifies where the operands are encoded. Different instructions take their operands from the available fields in slightly different orders.

Given that there are 16 xmm/ymm registers on 64b, we need to have 4 register specifier bits per register operand. You are correct that the VEX.VVVV field is self sufficient being 4b wide. In AVX, the other places that registers can be specified (MODRM.REG, MODRM.RM, SIB.BASE, SIB.INDEX) are 3b wide and thus all require another bit. The 4th register specifier bit comes from from the VEX.{R,X,B} fields, inverted. In SSE, the 4th bit came from the REX prefix fields.

minipli41 · ‎07-07-2011

Quoting Mark Charney (Intel)

Hi. Not sure what you mean by "registers 2 to 4". Each instruction description now has a box on the instruction page that specifies where the operands are encoded. Different instructions take their operands from the available fields in slightly different orders.

Oh, somehow I used to ignore that second box in the manual. I only used to look at the first one, describing the different encodings for one instruction. Thanks for making me look a little closer. Now it's clear to me how to encode the different instructions. :)

Quoting Mark Charney (Intel)

Given that there are 16 xmm/ymm registers on 64b, we need to have 4 register specifier bits per register operand. You are correct that the VEX.VVVV field is self sufficient being 4b wide. In AVX, the other places that registers can be specified (MODRM.REG, MODRM.RM, SIB.BASE, SIB.INDEX) are 3b wide and thus all require another bit. The 4th register specifier bit comes from from the VEX.{R,X,B} fields, inverted. In SSE, the 4th bit came from the REX prefix fields.

Yeah, that's what I meant with only needing 3 Bits (VES.{B,R,X}) to encode the missing bits for a maximum of four register arguments.

Thanks again! You helped me a lot!

mariaosawa · ‎11-02-2011

Thank you for very interesting article. Please continue writting. These facts are amazing . I was searching for at least 5 weaks and i didn't get the perfect answer. But after all i found from your site. thanks for posting such a interesting topic.