- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way to use pmovzxbd with a memory operand from intrinsics currently I have either
_mm_cvtepu8_epi32(_mm_cvtsi32(ptr[offset])); //(movd)
_mm_cvtepu8_epi32(_mm_insert_epi32(_mm_setzero_si128(),ptr[offset],0)); //(pinsrd)
The movd or pinsrd should not be needed; in assembly I can write something like
__asm{
pmovzxbd xmm0,[rax+rdx*4]
}
Is there a way I can make this call using intrinsics instead of assembly.
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Christopher H. wrote:
; in assembly I can write something like
__asm{
pmovzxbd xmm0,[rax+rdx*4]
}
Is there a way I can make this call using intrinsics instead of assembly.
_mm_cvtepu8_epi32((__m128i &)ptr[offset]);
will do the trick withthe Intel compiler
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
andysem wrote:
Doesn't pmovzxbd with a memory operand require 16-byte alignment and actually access 16 bytes?
no, the reg,mem variant is xmm1,m32
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
bronxzv wrote:
Quote:
Christopher H. wrote:
; in assembly I can write something like
__asm{
pmovzxbd xmm0,[rax+rdx*4]
}
Is there a way I can make this call using intrinsics instead of assembly.
_mm_cvtepu8_epi32((__m128i &)ptr[offset]);
will do the trick withthe Intel compiler
Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Christopher H. wrote:
Unfortunately this does not seem to work with clang, it just tries to do a full aligned __m128i load and then do the conversion.
oughh !
something else that works well with the Intel compiler is
_mm_cvtepu8_epi32(_mm_cvtsi32_si128(ptr[offset]));
where a single vpmovzxbd reg,mem is generated
arguably we miss an intrinsic or intrinsic signature for the reg,mem variant of vpmovzxbd, for more portable and cleaner code (AFAIK)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page