Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

__m512i array

lin__chiungliang
Beginner
3,414 Views

I'm trying to write code by VNNI

There is a data type - __m512i

which I think is mapping to registers on CPU.

I'd like to locate an array of registers

Here is the code

#include <immintrin.h>
int main()
{
        const int size = 5;

        __m512i zero[size];

        for(int i=0; i<size; i++)
        {
                zero = _mm512_setzero_si512();
        }

        return 0;
}

It works.

But when I try dynamic allocate memory

It doesn't work

#include <immintrin.h>
int main()
{
        const int size = 5;

        __m512i *zero = new __m512i[size];

        for(int i=0; i<size; i++)
        {
                zero = _mm512_setzero_si512();
        }

        delete [] zero;

        return 0;
}

Is there any way to create registers dynamically?

Lot of thanks

BR,

chiungliang

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
3,414 Views

>>There is a data type - __m512i...which I think is mapping to registers on CPU....I'd like to locate an array of registers

__m512i is a data type, and not a register. As to if a variable declared as such can remain in a register for its lifetime will depend on optimization options and source code. Each hardware thread on a CPU that supports AVX512 has 32 of these registers. Your sample code using stack local storage for an array of 5 of these type can be determined at compile time that an optimization can locate these in registers as opposed to on stack (provided optimization level permits this). The code using operator new assures that the location is in memory as opposed to being permitted to be located solely in registers

Consider coding this way:

#include <immintrin.h>
int main()
{
... some code here before performance critical section
      { // create nested scope
          const int size = 5;

          __m512i zero = __m512i[size];

          for(int i=0; i<size; i++)
          {
                zero = _mm512_setzero_si512();
         }
        .... code using (hopefully) registered
      } // end scope
     .. remainder code
        return 0;
}

*** If your array zero is intended to always contain vectors of zero then do not create such an array. Instead use _mm512_setzero_si512(). This is not a function call, rather it will insert an instruction to zero the targeted variable (of __m512i type).

IOW your array zero might be __m512i sum[size] that you pre-zero before accumulating a sum (of other __512i types).

Jim Dempsey

View solution in original post

0 Kudos
8 Replies
GouthamK_Intel
Moderator
3,414 Views

Hi Chiungliang,

Could you please elaborate more on the issue which you are facing? and please attach the logs and steps to reproduce. So that we can investigate your issue.

 

Regards

Goutham

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,415 Views

>>There is a data type - __m512i...which I think is mapping to registers on CPU....I'd like to locate an array of registers

__m512i is a data type, and not a register. As to if a variable declared as such can remain in a register for its lifetime will depend on optimization options and source code. Each hardware thread on a CPU that supports AVX512 has 32 of these registers. Your sample code using stack local storage for an array of 5 of these type can be determined at compile time that an optimization can locate these in registers as opposed to on stack (provided optimization level permits this). The code using operator new assures that the location is in memory as opposed to being permitted to be located solely in registers

Consider coding this way:

#include <immintrin.h>
int main()
{
... some code here before performance critical section
      { // create nested scope
          const int size = 5;

          __m512i zero = __m512i[size];

          for(int i=0; i<size; i++)
          {
                zero = _mm512_setzero_si512();
         }
        .... code using (hopefully) registered
      } // end scope
     .. remainder code
        return 0;
}

*** If your array zero is intended to always contain vectors of zero then do not create such an array. Instead use _mm512_setzero_si512(). This is not a function call, rather it will insert an instruction to zero the targeted variable (of __m512i type).

IOW your array zero might be __m512i sum[size] that you pre-zero before accumulating a sum (of other __512i types).

Jim Dempsey

0 Kudos
lin__chiungliang
Beginner
3,414 Views

Hi,

I write the code,

compile the code by

"g++-9 *.cpp  -march=cascadelake"

and execute it

Get an error

"Segmention fault (core dumped)

There is no other message

 

Thanks,

chiungliang

0 Kudos
GouthamK_Intel
Moderator
3,414 Views

Hi Chiungliang,

Can you please try compiling your code with the intel compiler and let us know if your problem still persists.

 

Thanks

Goutham

0 Kudos
lin__chiungliang
Beginner
3,414 Views

Hi,

Would you please let me know where can I download Intel compiler?

My OS version is ubuntu 18.04

Lot of thanks

chiungliang

0 Kudos
GouthamK_Intel
Moderator
3,414 Views

Hi Chiungliang,

Please install oneAPI Basekit and oneAPI HPC Toolkit. So, that you can use the Intel C++ compiler. 

Find the below links for the installation guide and download link for Basekit and HPC toolkit. 

 

Download link: https://software.intel.com/en-us/oneapi

Installation Guide: https://software.intel.com/en-us/articles/installation-guide-for-intel-oneapi-toolkits

Let us know if you face any further issues.

 

Regards

Goutham

 

0 Kudos
GouthamK_Intel
Moderator
3,414 Views

Hi Chiungliang,

Please confirm if your issue is resolved.

 

Thanks

Goutham

0 Kudos
GouthamK_Intel
Moderator
3,414 Views

Hi Chiungliang,

We are closing this thread.

Please feel free to raise a new thread in case of any further issues. 

 

Thanks

Goutham

0 Kudos
Reply