Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1112 Discussions

Using scatterstore to interleave two (sparse) 512-bit vector registers

Johannes_P_
Beginner
535 Views

Hi @ all,

I have two 512 bit vector registers and one mask16. The two registers contain (sparse) data:

idx:             15   14   13   12   11   10     9     8     7     6     5     4     3     2     1     0

zmmVidx: 120 112 104 096 088 080 072 064 056 048 040 032 024 016 008 000 

zmm0:        00   11   00   00   10   00   00   00   00   00   01   00   00   00   00   00

zmm1:        00   00   A0  00   00   B0   00   00   00   00   00   F0   00   00   00   00 

mask1:         0     1     0     0     1     0     0     0     0     0     1     0     0     0     0     0  

mask2:         0     0     1     0     0     1     0     0     0     0     0     1     0     0     0     0  

I want to interleave the two vector registers and store them continuosly into a 32 byte of memory, starting with zmm0:

idx:       ...15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

mem:    ...00 00 00 00 00 00 00 00 00 00 A0 11 B0 10 F0 01

I realized that behaviour with the following code

/* ... */
__m512i zmm2        = _mm512_maskz_compress_epi32( mask1, zmm0 );
__m512i zmm3        = _mm512_maskz_compress_epi32( mask2, zmm1 );
__mmask16 mask3 = _mm512_cmp_epi32_mask( zmm3, _mm512_setzero_epi32(), 4); 

_mm512_mask_i32scatter_epi32(
   mem, mask3, zmmVidx, zmm2, 1);
_mm512_mask_i32scatter_epi32(
  mem + sizeof( uint32_t ), mask3, zmmVidx, zmm3, 1);

I first align the sparse data continuously in the vector register and store them afterwards. Is it possible to directly perform a masked interleave to memory, so one can avoid a scatterstore and use a continuos store operation?

 

Sincerely yours

0 Kudos
2 Replies
jimdempseyatthecove
Honored Contributor III
535 Views

Why not .OR. the two (or add, or masked add or masked or), then store or masked store the results.

Scatter should be reserved when the data to be written will .NOT. reside within 64 bytes of each other (or cache line aligned depending on instruction).

Jim Dempsey

0 Kudos
Johannes_P_
Beginner
534 Views

Well, thanks for your reply but I think it is not that easy because the data can (!) overlap between the two vectors. So it is possible that , e.g. all values of the two vectors (32 byte resulting data), or a subset of them, while the indices of the two masks can (!) overlap should be interleaved. 

Sincerely

0 Kudos
Reply