For instructions such as VCVTSI2SD, your doc is clear. It says that in non-64 bit mode, W=1 will be have the same as W=0. That is, the second source will be 32 bits memory or a 32 bit GPR.
HOWEVER, AMD's doc says something different. I very rarely have seen any difference between Intel and AMD docs, and this is one such occasion. To me, it is very important for reasons of software compatibility to resolve any such differences.
In AMD Volume 4, page 101, it says:
When VEX.W = 1, converts a signed quadword integer value from a 64-bit source general-purpose
register or a 64-bit memory location to a double-precision floating-point value and writes the
converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the first
source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
There is no mention here of non-64-bit mode. Certainly this cannot be entirely correct because there are no 64-bit general-purpose registers. Hence the behavior is in doubt. It could generate a #UD (no such exception condition is listed), or it could just use a 32-bit register. However, it might still use a 64-bit memory operand.
Could you please check with someone at AMD and find out what the AMD device actually does with W=1 and get them to make a correction to the doc if applicable. Also, find out from your engineers what the Intel device actually does with W=1. Then please report back here with that information.
It would be interesting to know which documentation is incorrect, and especially interesting if AMD and Intel devices are not compatible.
If Intel and AMD devices behave differently, then perhaps both documents should say that the behavior is undefined or implementation-specific, so that software will not generate W=1.
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing