Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Christopher_H_
Beginner
69 Views

pmovzxbd using memory operands

Is there a way to use pmovzxbd with a memory operand from intrinsics currently I have either

_mm_cvtepu8_epi32(_mm_cvtsi32(ptr[offset])); //(movd)

_mm_cvtepu8_epi32(_mm_insert_epi32(_mm_setzero_si128(),ptr[offset],0));  //(pinsrd)

The movd or pinsrd should not be needed; in assembly I can write something like

__asm{

pmovzxbd xmm0,[rax+rdx*4]

}

Is there a way I can make this call using intrinsics instead of assembly.

0 Kudos
5 Replies
bronxzv
New Contributor II
69 Views

Christopher H. wrote:

; in assembly I can write something like

__asm{

pmovzxbd xmm0,[rax+rdx*4]

}

Is there a way I can make this call using intrinsics instead of assembly.

_mm_cvtepu8_epi32((__m128i &)ptr[offset]);

will do the trick withthe Intel compiler

 

andysem
New Contributor III
69 Views

Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?

 

bronxzv
New Contributor II
69 Views

andysem wrote:
Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?

no, the reg,mem variant is  xmm1,m32

Christopher_H_
Beginner
69 Views

bronxzv wrote:

Quote:

Christopher H. wrote:

 

; in assembly I can write something like

__asm{

pmovzxbd xmm0,[rax+rdx*4]

}

Is there a way I can make this call using intrinsics instead of assembly.

 

 

_mm_cvtepu8_epi32((__m128i &)ptr[offset]);

will do the trick withthe Intel compiler

 

 

Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion. 

bronxzv
New Contributor II
69 Views

Christopher H. wrote:
Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion. 

oughh !

something else that works well with the Intel compiler is

_mm_cvtepu8_epi32(_mm_cvtsi32_si128(ptr[offset]));

where a single vpmovzxbd reg,mem is generated

arguably  we miss an intrinsic or intrinsic signature for the reg,mem variant of vpmovzxbd, for more portable and cleaner code (AFAIK)