- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ all,
I have two 512 bit vector registers and one mask16. The two registers contain (sparse) data:
idx: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
zmmVidx: 120 112 104 096 088 080 072 064 056 048 040 032 024 016 008 000
zmm0: 00 11 00 00 10 00 00 00 00 00 01 00 00 00 00 00
zmm1: 00 00 A0 00 00 B0 00 00 00 00 00 F0 00 00 00 00
mask1: 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0
mask2: 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0
I want to interleave the two vector registers and store them continuosly into a 32 byte of memory, starting with zmm0:
idx: ...15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
mem: ...00 00 00 00 00 00 00 00 00 00 A0 11 B0 10 F0 01
I realized that behaviour with the following code
/* ... */ __m512i zmm2 = _mm512_maskz_compress_epi32( mask1, zmm0 ); __m512i zmm3 = _mm512_maskz_compress_epi32( mask2, zmm1 ); __mmask16 mask3 = _mm512_cmp_epi32_mask( zmm3, _mm512_setzero_epi32(), 4); _mm512_mask_i32scatter_epi32( mem, mask3, zmmVidx, zmm2, 1); _mm512_mask_i32scatter_epi32( mem + sizeof( uint32_t ), mask3, zmmVidx, zmm3, 1);
I first align the sparse data continuously in the vector register and store them afterwards. Is it possible to directly perform a masked interleave to memory, so one can avoid a scatterstore and use a continuos store operation?
Sincerely yours
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why not .OR. the two (or add, or masked add or masked or), then store or masked store the results.
Scatter should be reserved when the data to be written will .NOT. reside within 64 bytes of each other (or cache line aligned depending on instruction).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, thanks for your reply but I think it is not that easy because the data can (!) overlap between the two vectors. So it is possible that , e.g. all values of the two vectors (32 byte resulting data), or a subset of them, while the indices of the two masks can (!) overlap should be interleaved.
Sincerely
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page