Software Archive
Read-only legacy content

Problem with _mm256_and_ps instruction

Jesmin_Jahan_T_1
Beginner
1,251 Views

Hi, 

I was trying to do a very simple exercise using vector instructions. But I am getting wrong results. 

In the following program I am trying to do a bit-wise-and operation using _mm256_and_ps instruction. 
///////////////////////////////////////////////////// 
#include <immintrin.h> 
#include <iostream> 
using namespace std; 
int main(){ 

__m256 x1,x2; 
float* x=(float*) _mm_malloc(8*sizeof(float),64)
float* mask = (float*) _mm_malloc(8*sizeof(float),64)
for(int i=0;i<8;i++) 
x = (float)i; 
for(int i=0;i<8;i++) 
mask= (float)3; 
x1 =_mm256_load_ps(x); 
x2 = _mm256_load_ps(mask); 
x2 = _mm256_and_ps (x1,x2); 
_mm256_store_ps(x,x2); 
for(int i=0;i<8;i++) 
cout<<x<<endl; 
return 0; 

/////////////////////////////////////////////////////////// 
If you run the program, you will see that some of the results of the and operation are wrong. 
For example, 1 & 3 = 1 but the result from the program is 0, similarly 6&3 = 2 but its giving 3. 

Could anyone explain why that happened? Is this happening because I am using floating point data? 

Thanks and best regards, 
Jesmin

0 Kudos
1 Solution
Andrey_Vladimirov
New Contributor III
1,251 Views

Yes, that is because you are using floating point values. The floating point numbers are stored in the IEEE-754 format, so this is how numbers from 0.0 through 7.0 look in your calculation:

#include <immintrin.h> 

#include <iostream> 

using namespace std; 

void printbits(float* v) {
  unsigned int n = *((unsigned int*)v);
  unsigned int bitmask=(1<<31);
  for (int i = 0; i < 32; i++) {
    cout << ( (n & bitmask) ? '1' : '0' );
    bitmask = bitmask >> 1;
  }
}

int main(){ 
  __m256 x1,x2; 
  float* x=(float*) _mm_malloc(8*sizeof(float),64); 
  float* xorig=(float*) _mm_malloc(8*sizeof(float),64); 
  float* mask = (float*) _mm_malloc(8*sizeof(float),64); 
  for(int i=0;i<8;i++)  {
    x = (float)i; 
    xorig = x;
  }
  for(int i=0;i<8;i++) 
    mask= (float)3; 

  x1 =_mm256_load_ps(x); 
  x2 = _mm256_load_ps(mask); 
  x2 = _mm256_and_ps (x1,x2); 
  _mm256_store_ps(x,x2); 

  for(int i=0;i<8;i++) {
    cout << "i=" << i << ": ";
    printbits(&xorig);
    cout << " & ";
    printbits(&mask);
    cout << " = ";
    printbits(&x);
    cout << endl;
  }
  return 0; 
} 

 

[cfxuser@c001-n001 ~]$ ./a.out 
i=0: 00000000000000000000000000000000 & 01000000010000000000000000000000 = 00000000000000000000000000000000
i=1: 00111111100000000000000000000000 & 01000000010000000000000000000000 = 00000000000000000000000000000000
i=2: 01000000000000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=3: 01000000010000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
i=4: 01000000100000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=5: 01000000101000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=6: 01000000110000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
i=7: 01000000111000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
[cfxuser@c001-n001 ~]$

 

View solution in original post

0 Kudos
4 Replies
Andrey_Vladimirov
New Contributor III
1,252 Views

Yes, that is because you are using floating point values. The floating point numbers are stored in the IEEE-754 format, so this is how numbers from 0.0 through 7.0 look in your calculation:

#include <immintrin.h> 

#include <iostream> 

using namespace std; 

void printbits(float* v) {
  unsigned int n = *((unsigned int*)v);
  unsigned int bitmask=(1<<31);
  for (int i = 0; i < 32; i++) {
    cout << ( (n & bitmask) ? '1' : '0' );
    bitmask = bitmask >> 1;
  }
}

int main(){ 
  __m256 x1,x2; 
  float* x=(float*) _mm_malloc(8*sizeof(float),64); 
  float* xorig=(float*) _mm_malloc(8*sizeof(float),64); 
  float* mask = (float*) _mm_malloc(8*sizeof(float),64); 
  for(int i=0;i<8;i++)  {
    x = (float)i; 
    xorig = x;
  }
  for(int i=0;i<8;i++) 
    mask= (float)3; 

  x1 =_mm256_load_ps(x); 
  x2 = _mm256_load_ps(mask); 
  x2 = _mm256_and_ps (x1,x2); 
  _mm256_store_ps(x,x2); 

  for(int i=0;i<8;i++) {
    cout << "i=" << i << ": ";
    printbits(&xorig);
    cout << " & ";
    printbits(&mask);
    cout << " = ";
    printbits(&x);
    cout << endl;
  }
  return 0; 
} 

 

[cfxuser@c001-n001 ~]$ ./a.out 
i=0: 00000000000000000000000000000000 & 01000000010000000000000000000000 = 00000000000000000000000000000000
i=1: 00111111100000000000000000000000 & 01000000010000000000000000000000 = 00000000000000000000000000000000
i=2: 01000000000000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=3: 01000000010000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
i=4: 01000000100000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=5: 01000000101000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=6: 01000000110000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
i=7: 01000000111000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
[cfxuser@c001-n001 ~]$

 

0 Kudos
Jesmin_Jahan_T_1
Beginner
1,251 Views
Thanks Vladimirov
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,251 Views

Jesmin,

A follow-on question is: Do you need to perform something with floating point that is equivalent to logical and with integers (without converting the floats to ints)? If so, you can (hack) add the appropriate power of 2 to align the bits in the mantissa in a position suitable for your AND mask, apply the binary AND, then subtract the appropriate power of 2. Note, the portion of the mantissa you can manipulate for float, is 23 bits.

Jim Dempsey

0 Kudos
Jesmin_Jahan_T_1
Beginner
1,251 Views

Hi Jim,

Thank you! This will be very helpful!

Best Regards,

Jesmin

0 Kudos
Reply