- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so any instruction that can be encoded with a two byte vex prefix can be encoded with a three byte prefix. How about the other way around?
Is it true thatthat the two byte prefix may be usedif and only if
vex.W = 0
vex.X = 1
vex.B = 1
vex.mmmmm = 00001
?
I.E. VFMADD132PS may use the two byte form but VFMADD132PD needs the three byte form?
Is it true thatthat the two byte prefix may be usedif and only if
vex.W = 0
vex.X = 1
vex.B = 1
vex.mmmmm = 00001
?
I.E. VFMADD132PS may use the two byte form but VFMADD132PD needs the three byte form?
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tthsqe
so any instruction that can be encoded with a two byte vex prefix can be encoded with a three byte prefix. How about the other way around?
Is it true thatthat the two byte prefix may be usedif and only if
vex.W = 0
vex.X = 1
vex.B = 1
vex.mmmmm = 00001
?
I.E. VFMADD132PS may use the two byte form but VFMADD132PD needs the three byte form?
Is it true thatthat the two byte prefix may be usedif and only if
vex.W = 0
vex.X = 1
vex.B = 1
vex.mmmmm = 00001
?
I.E. VFMADD132PS may use the two byte form but VFMADD132PD needs the three byte form?
Hi,
The 3-byte VEX sequence (starting with C4) must be used when one needs to set VEX.W=1, VEX.X=0, VEX.B=0 or the opcodes are in map 0F3A or 0F38. The X and B bits are logically inverted bits relative to their meaning in the REX prefix. The 2-byte VEX sequence (staring with C5) can be used when the opcodes are in the 0F map and do not require these other bit settings.
For your 2nd question, all the VFMADD* instructions are in map 0F38, so they all must use the 3-byte (C4) VEX sequence.
An example of something that could use C4 or C5 is VADDPS. It is in map 0F. But which prefix sequence is required depends on the registers used. Here are two examples using the XED from the Intel SDE kit. In the first example, because the 3rd operand uses YMM13 and the operand is encoded in the MODRM.RM field, it uses the VEX.B'=0 to encode the upper bit of its register identifer. (REX.B would be 1 so VEX.B'=0). In the second example, since the 3rd operand is YMM3, we don't need to specify VEX.B'=0, and the shorter 2-byte C5 sequence can be used.
% kit/xed -64 -e vaddps ymm0 ymm1 ymm13
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM13, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C4C17458C5
.byte 0xc4,0xc1,0x74,0x58,0xc5
% kit/xed -64 -e vaddps ymm0 ymm1 ymm3
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM3, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C5F458C3
.byte 0xc5,0xf4,0x58,0xc3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Mark Charney (Intel)
Hi,
The 3-byte VEX sequence (starting with C4) must be used when one needs to set VEX.W=1, VEX.X=0, VEX.B=0 or the opcodes are in map 0F3A or 0F38. The X and B bits are logically inverted bits relative to their meaning in the REX prefix. The 2-byte VEX sequence (staring with C5) can be used when the opcodes are in the 0F map and do not require these other bit settings.
For your 2nd question, all the VFMADD* instructions are in map 0F38, so they all must use the 3-byte (C4) VEX sequence.
An example of something that could use C4 or C5 is VADDPS. It is in map 0F. But which prefix sequence is required depends on the registers used. Here are two examples using the XED from the Intel SDE kit. In the first example, because the 3rd operand uses YMM13 and the operand is encoded in the MODRM.RM field, it uses the VEX.B'=0 to encode the upper bit of its register identifer. (REX.B would be 1 so VEX.B'=0). In the second example, since the 3rd operand is YMM3, we don't need to specify VEX.B'=0, and the shorter 2-byte C5 sequence can be used.
% kit/xed -64 -e vaddps ymm0 ymm1 ymm13
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM13, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C4C17458C5
.byte 0xc4,0xc1,0x74,0x58,0xc5
% kit/xed -64 -e vaddps ymm0 ymm1 ymm3
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM3, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C5F458C3
.byte 0xc5,0xf4,0x58,0xc3
oh, sorry. I over looked that fmadd doesn't have the 0f opcode. I might try that sde. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting Mark Charney (Intel)
% kit/xed -64 -e vaddps ymm0 ymm1 ymm13
Request: VADDPS MODE:2, REG0:YMM0, REG1:YMM1, REG2:YMM13, SMODE:2
OPERAND ORDER: REG0 REG1 REG2
Encodable! C4C17458C5
.byte 0xc4,0xc1,0x74,0x58,0xc5
I'm a little confused. Why is the second byte 0xc1? This would imply VEX.X = 1 but shouldn't it be 0 because REX.X for YMM13 would be 1?
And even more confusing:
% printf '\\xc4\\xc1\\x74\\x58\\xc5' > vaddps.bin % printf '\\xc4\\x81\\x74\\x58\\xc5' >> vaddps.bin % ./xed -64 -ir vaddps.bin In raw... XDIS 0: AVX AVX C4C17458C5 vaddps ymm0, ymm1, ymm13 XDIS 5: AVX AVX C4817458C5 vaddps ymm0, ymm1, ymm13 ...
How can those two byte sequences decode to the same opcode? Shouldn't the former have REG2:YMM2 and only the latter REG2:YMM15?
Regards,
Mathias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
C4 C1 74 58 C5 and C4 81 74 58 C5 differ in the VEX.X bit. The VEX.X bit is not used in the encoding of these forms of these instructions. VEX.X is typically used to extend the index-register operand if there is one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh yeah, and in answer to the first part of your original question: VEX.X is stored inverted. (As are VEX.R and VEX.B). The reason for that has to do with how we re-used the LDS/LES instructions in 32b mode.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I think I got it. Since the VEX prefix allows to encode a full YMM register without any further extension bits in the vvvv field the VEX.{R,B,X} bits are used for possible registers 2 to 4, right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi. Not sure what you mean by "registers 2 to 4". Each instruction description now has a box on the instruction page that specifies where the operands are encoded. Different instructions take their operands from the available fields in slightly different orders.
Given that there are 16 xmm/ymm registers on 64b, we need to have 4 register specifier bits per register operand. You are correct that the VEX.VVVV field is self sufficient being 4b wide. In AVX, the other places that registers can be specified (MODRM.REG, MODRM.RM, SIB.BASE, SIB.INDEX) are 3b wide and thus all require another bit. The 4th register specifier bit comes from from the VEX.{R,X,B} fields, inverted. In SSE, the 4th bit came from the REX prefix fields.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting Mark Charney (Intel)
Hi. Not sure what you mean by "registers 2 to 4". Each instruction description now has a box on the instruction page that specifies where the operands are encoded. Different instructions take their operands from the available fields in slightly different orders.
Oh, somehow I used to ignore that second box in the manual. I only used to look at the first one, describing the different encodings for one instruction. Thanks for making me look a little closer. Now it's clear to me how to encode the different instructions. :)
Quoting Mark Charney (Intel)
Given that there are 16 xmm/ymm registers on 64b, we need to have 4 register specifier bits per register operand. You are correct that the VEX.VVVV field is self sufficient being 4b wide. In AVX, the other places that registers can be specified (MODRM.REG, MODRM.RM, SIB.BASE, SIB.INDEX) are 3b wide and thus all require another bit. The 4th register specifier bit comes from from the VEX.{R,X,B} fields, inverted. In SSE, the 4th bit came from the REX prefix fields.
Yeah, that's what I meant with only needing 3 Bits (VES.{B,R,X}) to encode the missing bits for a maximum of four register arguments.
Thanks again! You helped me a lot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for very interesting article. Please continue writting. These facts are amazing . I was searching for at least 5 weaks and i didn't get the perfect answer. But after all i found from your site. thanks for posting such a interesting topic.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page