- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Reading the "Knights Corner Instruction Set Reference Manual" I see that there are 32 vector registers where the x64 instruction set has only 16. The V register field and the R register field in the MVEX prefix are extended with an extra bit (V', R') to code the extra registers. But the B and X fields are not extended. How do you code register zmm16 - zmm31 in an instruction with three or more register operands? Is this impossible, or are you using some other bits, like the pp bits which are mostly unused anyway or the unused bit to the left of the pp bits? Maybe you are using the X bit, which is not needed anyway if there is no memory operand, to extend the B bits. Then the only limitation would be that registers zmm16 - zmm31 cannot be used with VSIB addressing. Are the extra bits inverted?
I would like to update my disassembler (named "objconv") to cover this instruction set so I need this info.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The X bits are extended with one of the V bits when there is a VSIB memory operand, but it is not clear which of the V bits. Page 42 and 43 say VX, should that be V'X?
Page 42 says that the vector mask register is coded in the MVEX.aaa bits. Should that be MVEX.kkk?
The assembly syntax on p. 45 does not explain clearly how to indicate swizzle, etc. It says:
mnemonic vreg{masking modifier}, source1, transform_modifier(vreg/mem)
Perhaps that should be:
mnemonic vreg{masking modifier}, source1, vreg/mem{transform_modifier} ?
How are the JKZD and JKNZD instructions coded? No 0F escape code is indicated for the short jump version. Does that mean mmmmm=0? This is contradicted on page 44 saying mmmmm=0 will cause an exception. Is the mask register coded in the vvvv bits or is there a mod/reg/rm byte?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for catching these doc issues. I don't have hardware to test but I believe
1. in VSIB encoding, the index operand would be encoded with MVEX.V'X
2.There was a latent notation change that led to two different notation expressing the same feature. MVEX.aaa is the correct notation that replaces MVEX.kkk.
3. I believe the notation convention of transform_modifieris consistent with the table listed in pg 47
4. It turns out, the mmmm=0000 mapwas used to encode some of the scalar mask instructions.
With upated correction.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@3. The table on p. 47 refers to the kind of swizzle, not the actual value. Each entry in table 3.1 points to another table of 8 possible values. For example Sf32 refers to table 2.2 listing 8 possible values for register operands and table 2.4 listing 7 possible values for memory opeands. So I think the value to list in assembly would be e.g. zmm3{cdab}. The notation Sf32(zmm3) gives only the kind of swizzle, not the chosen value.
@4. The codes for JKZD and JKNZD are identical to the codes for JZ and JNZ with a VEX prefix added. JZ and JNZ have a short version without 0F and a near version with 0F. Neither have a mod/reg/rm byte. I think, for the sake of decoder efficiency, that the instructions with VEX prefix will have the same composition as the corresponding codes without VEX prefix. This would mean no 0F escape code and no mod/reg/rm byte for the short version. A VEX prefixed code without 0F escape code is unprecedented so we don't know what the value of mmmmm should be, but 0 is a logical guess. This information is missing in the manual.
And BTW, I have more questions:
5. Which CPUID bit indicates support for Knights Corner/MIC instructions? Does this instruction set have an official name yet?
6. How many zmm registers are there in 32-bit mode? I understand the the preferred mode is 64 bits, but the first line in chapter 2.4 page 36 says that 32-bit mode is also supported. The MVEX prefix is carefully designed to be compatible with 32-bit mode in the same way as the VEX prefixes. The R and X bits are not available in 32-bit mode because they are used for another instruction (BOUND) in 32-bit mode. So bit number 3 in the 5-bit register number is fixed at 0 in 32-bit mode, while bits 0,1,2 and 4 are free to be 0 or 1. So the possible register numbers in 32-bit mode are 0-7 and 16-23. This gives three possibilities in 32-bit mode:
a. 8 zmm registers named zmm0-zmm7
b. 16 zmm registers named zmm0-zmm7 and zmm16-zmm23
c. 16 zmm registers renamed to zmm0-zmm15
I would prefer c, but which one is correct?
7. There are rumors that Knights corner instructions will be supported in mainline Intel chips in the future, perhaps in Broadwell, and that SSE-AVX will be supported in a later generation of Knights. Can you comment on this or is it just unconfirmed rumors?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think it suffices to infer from the CPUID section of this doc that the instruction set support (addition/subtraction) in Knights Corner that are not covered by feature flags is captured by the Family/model.
The 32-bit mode question is treading into tech support scope outside of my interest. I should leave that for more qualified folks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I consider it absolutely necessary that you implement a CPUID bit for the Knights Corner instruction set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Knights Corner Instruction Set Reference Manual has been updated with a correction for the mmmm=0 question, but not the other questions.
I found one more possible typo in the manual:
On pages 640 - 654, the legacy instruction called PREFETCH0 should correctly be called PREFETCHT0, according to previous x86 manuals. Accordingly, vprefetch0, vprefetch1 and vprefetch2 might preferably be named vprefetcht0, vprefetcht1 and vprefetcht2 to match the names of the same instructions without VEX prefix.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page