Solved: Hi Jim,

Jesmin_Jahan_T_1 · ‎05-05-2014

Hi,

I was trying to do a very simple exercise using vector instructions. But I am getting wrong results.

In the following program I am trying to do a bit-wise-and operation using _mm256_and_ps instruction.
/////////////////////////////////////////////////////
#include <immintrin.h>
#include <iostream>
using namespace std;
int main(){

__m256 x1,x2;
float* x=(float*) _mm_malloc(8*sizeof(float),64);
float* mask = (float*) _mm_malloc(8*sizeof(float),64);
for(int i=0;i<8;i++)
x = (float)i;
for(int i=0;i<8;i++)
mask= (float)3;
x1 =_mm256_load_ps(x);
x2 = _mm256_load_ps(mask);
x2 = _mm256_and_ps (x1,x2);
_mm256_store_ps(x,x2);
for(int i=0;i<8;i++)
cout<<x<<endl;
return 0;
}
///////////////////////////////////////////////////////////
If you run the program, you will see that some of the results of the and operation are wrong.
For example, 1 & 3 = 1 but the result from the program is 0, similarly 6&3 = 2 but its giving 3.

Could anyone explain why that happened? Is this happening because I am using floating point data?

Thanks and best regards,
Jesmin

Andrey_Vladimirov · ‎05-05-2014

Yes, that is because you are using floating point values. The floating point numbers are stored in the IEEE-754 format, so this is how numbers from 0.0 through 7.0 look in your calculation:

#include <immintrin.h> 

#include <iostream> 

using namespace std; 

void printbits(float* v) {
  unsigned int n = *((unsigned int*)v);
  unsigned int bitmask=(1<<31);
  for (int i = 0; i < 32; i++) {
    cout << ( (n & bitmask) ? '1' : '0' );
    bitmask = bitmask >> 1;
  }
}

int main(){ 
  __m256 x1,x2; 
  float* x=(float*) _mm_malloc(8*sizeof(float),64); 
  float* xorig=(float*) _mm_malloc(8*sizeof(float),64); 
  float* mask = (float*) _mm_malloc(8*sizeof(float),64); 
  for(int i=0;i<8;i++)  {
    x = (float)i; 
    xorig = x;
  }
  for(int i=0;i<8;i++) 
    mask= (float)3; 

  x1 =_mm256_load_ps(x); 
  x2 = _mm256_load_ps(mask); 
  x2 = _mm256_and_ps (x1,x2); 
  _mm256_store_ps(x,x2); 

  for(int i=0;i<8;i++) {
    cout << "i=" << i << ": ";
    printbits(&xorig);
    cout << " & ";
    printbits(&mask);
    cout << " = ";
    printbits(&x);
    cout << endl;
  }
  return 0; 
}

[cfxuser@c001-n001 ~]$ ./a.out 
i=0: 00000000000000000000000000000000 & 01000000010000000000000000000000 = 00000000000000000000000000000000
i=1: 00111111100000000000000000000000 & 01000000010000000000000000000000 = 00000000000000000000000000000000
i=2: 01000000000000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=3: 01000000010000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
i=4: 01000000100000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=5: 01000000101000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=6: 01000000110000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
i=7: 01000000111000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
[cfxuser@c001-n001 ~]$

View solution in original post

Andrey_Vladimirov · ‎05-05-2014

Yes, that is because you are using floating point values. The floating point numbers are stored in the IEEE-754 format, so this is how numbers from 0.0 through 7.0 look in your calculation:

#include <immintrin.h> 

#include <iostream> 

using namespace std; 

void printbits(float* v) {
  unsigned int n = *((unsigned int*)v);
  unsigned int bitmask=(1<<31);
  for (int i = 0; i < 32; i++) {
    cout << ( (n & bitmask) ? '1' : '0' );
    bitmask = bitmask >> 1;
  }
}

int main(){ 
  __m256 x1,x2; 
  float* x=(float*) _mm_malloc(8*sizeof(float),64); 
  float* xorig=(float*) _mm_malloc(8*sizeof(float),64); 
  float* mask = (float*) _mm_malloc(8*sizeof(float),64); 
  for(int i=0;i<8;i++)  {
    x = (float)i; 
    xorig = x;
  }
  for(int i=0;i<8;i++) 
    mask= (float)3; 

  x1 =_mm256_load_ps(x); 
  x2 = _mm256_load_ps(mask); 
  x2 = _mm256_and_ps (x1,x2); 
  _mm256_store_ps(x,x2); 

  for(int i=0;i<8;i++) {
    cout << "i=" << i << ": ";
    printbits(&xorig);
    cout << " & ";
    printbits(&mask);
    cout << " = ";
    printbits(&x);
    cout << endl;
  }
  return 0; 
}

[cfxuser@c001-n001 ~]$ ./a.out 
i=0: 00000000000000000000000000000000 & 01000000010000000000000000000000 = 00000000000000000000000000000000
i=1: 00111111100000000000000000000000 & 01000000010000000000000000000000 = 00000000000000000000000000000000
i=2: 01000000000000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=3: 01000000010000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
i=4: 01000000100000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=5: 01000000101000000000000000000000 & 01000000010000000000000000000000 = 01000000000000000000000000000000
i=6: 01000000110000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
i=7: 01000000111000000000000000000000 & 01000000010000000000000000000000 = 01000000010000000000000000000000
[cfxuser@c001-n001 ~]$

Jesmin_Jahan_T_1 · ‎05-05-2014

Thanks Vladimirov

jimdempseyatthecove · ‎05-06-2014

Jesmin,

A follow-on question is: Do you need to perform something with floating point that is equivalent to logical and with integers (without converting the floats to ints)? If so, you can (hack) add the appropriate power of 2 to align the bits in the mantissa in a position suitable for your AND mask, apply the binary AND, then subtract the appropriate power of 2. Note, the portion of the mantissa you can manipulate for float, is 23 bits.

Jim Dempsey

Jesmin_Jahan_T_1 · ‎05-06-2014

Hi Jim,

Thank you! This will be very helpful!

Best Regards,

Jesmin

Problem with _mm256_and_ps instruction