Quote:Christopher H. wrote:

Christopher_H_ · ‎12-16-2014

Is there a way to use pmovzxbd with a memory operand from intrinsics currently I have either

_mm_cvtepu8_epi32(_mm_cvtsi32(ptr[offset])); //(movd)

_mm_cvtepu8_epi32(_mm_insert_epi32(_mm_setzero_si128(),ptr[offset],0)); //(pinsrd)

The movd or pinsrd should not be needed; in assembly I can write something like

__asm{

pmovzxbd xmm0,[rax+rdx*4]

}

Is there a way I can make this call using intrinsics instead of assembly.

bronxzv · ‎12-16-2014

Christopher H. wrote:

; in assembly I can write something like

__asm{

pmovzxbd xmm0,[rax+rdx*4]

}

Is there a way I can make this call using intrinsics instead of assembly.

_mm_cvtepu8_epi32((__m128i &)ptr[offset]);

will do the trick withthe Intel compiler

andysem · ‎12-17-2014

Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?

bronxzv · ‎12-17-2014

andysem wrote:
Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?

no, the reg,mem variant is xmm1,m32

Christopher_H_ · ‎12-18-2014

bronxzv wrote:

Quote:

Christopher H. wrote:

; in assembly I can write something like

__asm{

pmovzxbd xmm0,[rax+rdx*4]

}

Is there a way I can make this call using intrinsics instead of assembly.

_mm_cvtepu8_epi32((__m128i &)ptr[offset]);

will do the trick withthe Intel compiler

Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion.

bronxzv · ‎12-18-2014

Christopher H. wrote:
Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion.

oughh !

something else that works well with the Intel compiler is

_mm_cvtepu8_epi32(_mm_cvtsi32_si128(ptr[offset]));

where a single vpmovzxbd reg,mem is generated

arguably we miss an intrinsic or intrinsic signature for the reg,mem variant of vpmovzxbd, for more portable and cleaner code (AFAIK)

pmovzxbd using memory operands