Is there a way to use pmovzxbd with a memory operand from intrinsics currently I have either
_mm_cvtepu8_epi32(_mm_cvtsi32(ptr[offset])); //(movd)
_mm_cvtepu8_epi32(_mm_insert_epi32(_mm_setzero_si128(),ptr[offset],0)); //(pinsrd)
The movd or pinsrd should not be needed; in assembly I can write something like
__asm{
pmovzxbd xmm0,[rax+rdx*4]
}
Is there a way I can make this call using intrinsics instead of assembly.
Link Copied
Christopher H. wrote:
; in assembly I can write something like
__asm{
pmovzxbd xmm0,[rax+rdx*4]
}
Is there a way I can make this call using intrinsics instead of assembly.
_mm_cvtepu8_epi32((__m128i &)ptr[offset]);
will do the trick withthe Intel compiler
Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?
andysem wrote:
Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?
no, the reg,mem variant is xmm1,m32
bronxzv wrote:
Quote:
Christopher H. wrote:
; in assembly I can write something like
__asm{
pmovzxbd xmm0,[rax+rdx*4]
}
Is there a way I can make this call using intrinsics instead of assembly.
_mm_cvtepu8_epi32((__m128i &)ptr[offset]);
will do the trick withthe Intel compiler
Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion.
Christopher H. wrote:
Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion.
oughh !
something else that works well with the Intel compiler is
_mm_cvtepu8_epi32(_mm_cvtsi32_si128(ptr[offset]));
where a single vpmovzxbd reg,mem is generated
arguably we miss an intrinsic or intrinsic signature for the reg,mem variant of vpmovzxbd, for more portable and cleaner code (AFAIK)
For more complete information about compiler optimizations, see our Optimization Notice.