Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

pmovzxbd using memory operands

Christopher_H_
Beginner
633 Views

Is there a way to use pmovzxbd with a memory operand from intrinsics currently I have either

_mm_cvtepu8_epi32(_mm_cvtsi32(ptr[offset])); //(movd)

_mm_cvtepu8_epi32(_mm_insert_epi32(_mm_setzero_si128(),ptr[offset],0));  //(pinsrd)

The movd or pinsrd should not be needed; in assembly I can write something like

__asm{

pmovzxbd xmm0,[rax+rdx*4]

}

Is there a way I can make this call using intrinsics instead of assembly.

0 Kudos
5 Replies
bronxzv
New Contributor II
633 Views

Christopher H. wrote:

; in assembly I can write something like

__asm{

pmovzxbd xmm0,[rax+rdx*4]

}

Is there a way I can make this call using intrinsics instead of assembly.

_mm_cvtepu8_epi32((__m128i &)ptr[offset]);

will do the trick withthe Intel compiler

 

0 Kudos
andysem
New Contributor III
633 Views

Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?

 

0 Kudos
bronxzv
New Contributor II
633 Views

andysem wrote:
Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?

no, the reg,mem variant is  xmm1,m32

0 Kudos
Christopher_H_
Beginner
633 Views

bronxzv wrote:

Quote:

Christopher H. wrote:

 

; in assembly I can write something like

__asm{

pmovzxbd xmm0,[rax+rdx*4]

}

Is there a way I can make this call using intrinsics instead of assembly.

 

 

_mm_cvtepu8_epi32((__m128i &)ptr[offset]);

will do the trick withthe Intel compiler

 

 

Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion. 

0 Kudos
bronxzv
New Contributor II
633 Views

Christopher H. wrote:
Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion. 

oughh !

something else that works well with the Intel compiler is

_mm_cvtepu8_epi32(_mm_cvtsi32_si128(ptr[offset]));

where a single vpmovzxbd reg,mem is generated

arguably  we miss an intrinsic or intrinsic signature for the reg,mem variant of vpmovzxbd, for more portable and cleaner code (AFAIK)

0 Kudos
Reply