Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
Announcements
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.
7747 Discussions

Large bit vector left shift with AVX512

Thorax581
Beginner
594 Views
I'm new to AVX/AVX512 and I'm trying to implement a left shift (1 to 7 bits) on a bit vector that is thousands of bits in length (Skylake processor).
 
From the Intrinsics Guide, I can make repeated calls to "__m512i _mm512_shldi_epi64" to process 512 bits at a time (but I must manually handle the upper bits shifted out in each 64bit value).
 
Is there a better way to accomplish this?  Thanks in advance for all tips and pointers.
 
S.
0 Kudos
7 Replies
AndrewG_Intel
Moderator
561 Views

Hello @Thorax581

Thank you for posting on the Intel® communities.


In order to provide you with the correct information or route this request to the proper channel of support, could you please confirm/provide the following information?


1- The exact model (SKU) of the Skylake processor you are using:

2- Are you developing software or hardware using Intel® components and/or Intel® tools? If yes, please provide more details and elaborate more on the type of project you are developing:

3- Just to make sure, when you said "Intrinsics Guide", do you mean this >> Intel® Intrinsics Guide?

Are you working with Intel® C++ Compiler and Intel® C++ Compiler Classic Developer Guide and Reference?


Best regards,

Andrew G.

Intel Customer Support Technician


Thorax581
Beginner
543 Views

Hi Andrew,

1) The processor is a "Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz", Stepping 4 and flags include avx512f, avx512dq, avx512cd, avx512bw and avx512vl.  (IceLake?)

2) Code is getting blocks of LDPC encoded data from a Mount Bryce, then rotating (cyclic shift) the resultant bit buffer left by 1-7 bits.  The buffer can be 1k to 1M bits and gcc is the compiler.  Speed critical.

3) Yes, I've been searching the online Intrinsics Guide that you mention.  I'm planning to process the buffer 512 bits at a time using _mm512_maskz_and_epi64, _mm512_rol_epi64, _mm512_sll_epi64 and _mm512_or_epi64 to do the rotation and manual fixups of the bits that are lost across lanes as a result of the shift.  The leftmost bits will then be carried into the next buffer chunk and the process repeated.

 

As I'm new to AVX, and I'm sure many others have done this sort of shift before, I just wanted to make sure there was not some more obvious way of doing this, or better/faster intrinsics than the ones I've chosen.

 

Thanks in advance for any feedback.


S.

AndrewG_Intel
Moderator
535 Views

Hello Thorax581

Thank you for your response and for all the details. Please allow us to review this further and we will be posting back in the thread as soon as possible.


Best regards,

Andrew G.

Intel Customer Support Technician


AndrewG_Intel
Moderator
528 Views

Hello Thorax581


After reviewing this further, for questions about Intel intrinsics, the recommendation is to visit the Intel® C++ Compiler board, therefore, we are moving this question to the proper forum so it can get answered more quickly: https://community.intel.com/t5/Intel-C-Compiler/bd-p/c-compiler


Please kindly wait for an answer from the proper team.

Best regards,

Andrew G.

Intel Customer Support Technician


SantoshY_Intel
Moderator
456 Views

Hi,


As we can use only at most 512 bits at a time, it is better to split the input bits vector into 512 bits and use the intrinsics repeatedly. In this scenario, _mm512_shldi_epi64 is the best intrinsic to choose. Whatever you were trying to do is the better way of implementing it. There are no other faster intrinsics to accomplish this task.


Thanks & Regards,

Santosh




SantoshY_Intel
Moderator
402 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks & Regards,

Santosh


SantoshY_Intel
Moderator
354 Views

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks & Regards,

Santosh


Reply