Solved: Possible errors in instruction semantics

Dasgupta__Sandeep · ‎04-04-2018

Dear Team,

I would like to report on some disparity between some instruction specification as documented in the Intel 64 and IA-32 Architectures Software Developer's Manual, Vol 2 and the actual execution behaviour.

Bug Report 1: vpsravd %xmm3, %xmm2, %xmm1

Semantics as per the above manual:

%ymm1  : 0x0₁₂₈ ∘ ((%ymm2[127:96] sign_shift_right (0x0₂₇ ∘ %ymm3[100:96])) ∘
                  ((%ymm2[95:64] sign_shift_right %ymm3[95:64]) ∘
                  ((%ymm2[63:32] sign_shift_right %ymm3[63:32]) ∘
                  (%ymm2[31:0] sign_shift_right %ymm3[31:0]))))

** ∘ is the concatenate symbol here.

Note that the first term ((%ymm2[127:96] sign_shift_right (0x0₂₇ ∘ %ymm3[100:96])) has only 5 bits selected from '%ymm3'.

But the actual execution behaviour seem to expect 32 bits from %ymm3, i.e., ((%ymm2[127:96] sign_shift_right 0x0₂₇ ∘ %ymm3[127:96])

The following is the pseudo code from manual

VPSRAVD (VEX.128 version)
COUNT_0 = SRC2[31 : 0]
(* Repeat Each COUNT_i for the 2nd through 4th dwords of SRC2*)
COUNT_3 = SRC2[100 : 96]; //<------------------------------------- Possibly a bug
DEST[31:0] = SignExtend(SRC1[31:0] >> COUNT_0);
(* Repeat shift operation for 2nd through 4th dwords *)
DEST[127:96] = SignExtend(SRC1[127:96] >> COUNT_3);
DEST[MAXVL-1:128] = 0;

I am expecting the above bold portion to be a bug and should be SRC2[127 : 96]

Test Input (in Hex):

%ymm2: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - **80 00 00 00** - 00 00 00 00 - 00 00 00 00 - 00 00 00 00

%ymm3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - **00 00 00 20** - 00 00 00 00 - 00 00 00 00 - 00 00 00 00

As per the manual, we should select just 5 bits from `00 00 00 20`, where as hardware execution semantics require all 32 bits.

Output as per manual:

 0x0₁₂₈ ∘ ((0x80000000₃₂ sign_shift_right 0x0₃₂) ∘
          ((0x0₃₂ sign_shift_right 0x0₃₂) ∘
          ((0x0₃₂ sign_shift_right 0x0₃₂) ∘
          (0x0₃₂ sign_shift_right 0x0₃₂))))

Output as per actual Intel hardware (Intel(R) Xeon(R) CPU E3-1505M):

 0x0₆₄ ∘ 0x0₆₄ ∘ 0xffffffff00000000₆₄ ∘ 0x0₆₄

The same probable typo appears in the pseudocode for these instructions:

VPSLLVD (VEX.128 version)
VPSLLVD (VEX.256 version)
VPSLLVQ (VEX.256 version)
VPSRAVD (VEX.256 version)

Also, there seems to be a typo in the description text of VPSRAVW/VPSRAVD/VPSRAVQ. There are two paragraphs starting with "The count values..."; the second one should be deleted.

Bug Report 2: packsswb

There seems to be bug in the descriptive text

If the signed doubleword value is beyond the range of an unsigned word (i.e. greater than 7FH or less than 80H), ...

In my opinion, the description must say range of signed word insead.

MarkC_Intel · ‎04-05-2018

Thx. there was more going on in the 2nd example that needs fixing. I'll pass these along to our documentation person. Thanks for reporting the issues.

View solution in original post

MarkC_Intel · ‎04-05-2018

Thx. there was more going on in the 2nd example that needs fixing. I'll pass these along to our documentation person. Thanks for reporting the issues.

Dasgupta__Sandeep · ‎04-05-2018

Thanks Mark! Really apreciate your reply.

Do you mind sharing what else is going wrong with the second example ? That might help my current work.

MarkC_Intel · ‎04-05-2018

the 2nd sentence of that paragraph talks about doublewords and words when it should be talking about words and bytes. The operation section makes it a little more clear.

Dasgupta__Sandeep · ‎04-05-2018

Indeed. Thanks again.