Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7942 Discussions

Large bit vector left shift with AVX512

Thorax581
Beginner
1,219 Views
I'm new to AVX/AVX512 and I'm trying to implement a left shift (1 to 7 bits) on a bit vector that is thousands of bits in length (Skylake processor).
 
From the Intrinsics Guide, I can make repeated calls to "__m512i _mm512_shldi_epi64" to process 512 bits at a time (but I must manually handle the upper bits shifted out in each 64bit value).
 
Is there a better way to accomplish this?  Thanks in advance for all tips and pointers.
 
S.
0 Kudos
7 Replies
AndrewG_Intel
Moderator
1,186 Views

Hello @Thorax581

Thank you for posting on the Intel® communities.


In order to provide you with the correct information or route this request to the proper channel of support, could you please confirm/provide the following information?


1- The exact model (SKU) of the Skylake processor you are using:

2- Are you developing software or hardware using Intel® components and/or Intel® tools? If yes, please provide more details and elaborate more on the type of project you are developing:

3- Just to make sure, when you said "Intrinsics Guide", do you mean this >> Intel® Intrinsics Guide?

Are you working with Intel® C++ Compiler and Intel® C++ Compiler Classic Developer Guide and Reference?


Best regards,

Andrew G.

Intel Customer Support Technician


0 Kudos
Thorax581
Beginner
1,168 Views

Hi Andrew,

1) The processor is a "Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz", Stepping 4 and flags include avx512f, avx512dq, avx512cd, avx512bw and avx512vl.  (IceLake?)

2) Code is getting blocks of LDPC encoded data from a Mount Bryce, then rotating (cyclic shift) the resultant bit buffer left by 1-7 bits.  The buffer can be 1k to 1M bits and gcc is the compiler.  Speed critical.

3) Yes, I've been searching the online Intrinsics Guide that you mention.  I'm planning to process the buffer 512 bits at a time using _mm512_maskz_and_epi64, _mm512_rol_epi64, _mm512_sll_epi64 and _mm512_or_epi64 to do the rotation and manual fixups of the bits that are lost across lanes as a result of the shift.  The leftmost bits will then be carried into the next buffer chunk and the process repeated.

 

As I'm new to AVX, and I'm sure many others have done this sort of shift before, I just wanted to make sure there was not some more obvious way of doing this, or better/faster intrinsics than the ones I've chosen.

 

Thanks in advance for any feedback.


S.

0 Kudos
AndrewG_Intel
Moderator
1,160 Views

Hello Thorax581

Thank you for your response and for all the details. Please allow us to review this further and we will be posting back in the thread as soon as possible.


Best regards,

Andrew G.

Intel Customer Support Technician


0 Kudos
AndrewG_Intel
Moderator
1,153 Views

Hello Thorax581


After reviewing this further, for questions about Intel intrinsics, the recommendation is to visit the Intel® C++ Compiler board, therefore, we are moving this question to the proper forum so it can get answered more quickly: https://community.intel.com/t5/Intel-C-Compiler/bd-p/c-compiler


Please kindly wait for an answer from the proper team.

Best regards,

Andrew G.

Intel Customer Support Technician


0 Kudos
SantoshY_Intel
Moderator
1,081 Views

Hi,


As we can use only at most 512 bits at a time, it is better to split the input bits vector into 512 bits and use the intrinsics repeatedly. In this scenario, _mm512_shldi_epi64 is the best intrinsic to choose. Whatever you were trying to do is the better way of implementing it. There are no other faster intrinsics to accomplish this task.


Thanks & Regards,

Santosh




0 Kudos
SantoshY_Intel
Moderator
1,027 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
979 Views

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks & Regards,

Santosh


0 Kudos
Reply