Seg. fault when using _m512_mul_pd

Keegan_S_ · ‎06-10-2014

I'm an undergrad student working on some code for that is supposed to just do some simple matrix multiplication. The professor I'm working for has purchased a MIC and I'm trying to get things working using some of the C++ intrinsics. In particular I'm using _m512_mul_pd to try to multiply two together and I'm storing the result in another vector. However, when ever I have any code that accesses the variable I use to store the result of the multiplication in I get a seg. fault. Any ideas of why this is happening and what I can do to fix it?

Here are the lines of code I'm talking about:

__m512d tmp = _m512_mul_pd(matrix[0].v, m.matrix[0].v) //works fine

return Matrix(tmp) // causes seg fault

Patrick_S_ · ‎06-10-2014

is your data aligned on 64 byte boundary?

I think you should post more of your code. It is unclear for me what "return Matrix(tmp)" does.

Keegan_S_ · ‎06-10-2014

In the header file I have a union defined as follows which I then use to hold the matrix data such that its 64 byte aligned.

union dvec{
     __m512d v;
     double d[8];
};

union dvec matrix[5] __attribute__((align(64)));

I then have a constructor that takes doubles and another that takes a __m512d data type to create a Matrix object, thus what Matrix(tmp) should be doing in my first post.

In the function I'm having the seg. fault in I take a single matrix object as a parameter and then then try to multiply the two together.

Matrix Matrix::Multiply(const Matrix& m){
     union dvec tmp __attribute__((align(64)));
     tmp = _mm512_setzero_pd();
     tmp = _mm512_mul_pd(matrix[0].v, m.matrix[0].v);
     return Matrix(tmp);
}

I realize that as I have it written now it doesn't actually multiply the matrices together by the standard definition of matrix multiplication. This is more of just a test at the moment to make sure I can get the multiply intrinsic to work correctly. Also the reason I declare matrix to be an array of 5 union dvec types is so that I can work with larger matrices than those that just hold 8 elements.

Patrick_S_ · ‎06-10-2014

you should try

__declspec(align(64)) union dvec {

        __m512d v;
        double d[8];
};

However, I think you shouldn't use unions. There are betters ways to do this e.g.

__m512d m[8];

as a private member of the class. Then the matrix double constructor has to use

 __m512d m[0] = _mm512_set_pd( 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 );

Referring to your "Multiply" method:

tmp = _mm512_setzero_pd();

is not needed. Also, the union tmp is only one __m512d type. I guess your constructor needs more then one __m512d as input?

Patrick_S_ · ‎06-10-2014

btw you should have a look into a vector class library e.g.

this one http://www.agner.org/optimize/vectorclass.zip

class Vec8f is located in vectorf256.h. Keep in mind this is a AVX library. You can use it only as an inspiration.

Keegan_S_ · ‎06-10-2014

If I get rid of the union how do I access individual elements of the vector for printing or other such things?

Sorry if knowing that should be a basic thing. I'm still new to this and the only example code I have to go off the guy used unions to access the individual elements.

Patrick_S_ · ‎06-10-2014

You should avoid to access individual elements of a vector register. This is a very expensive operation, because there is no corresponding assembler instruction. You should always use masked vector operations e.g.

add two __m512d but only the first 4 elements

__m512d a, b;

a = _mm512_mask_add_pd( a, 0x0F, a, b);


// equivalent C operation

double a[8], b[8];

a[0] = a[0] + b[0]
a[1] = a[1] + b[1]
a[2] = a[2] + b[2]
a[3] = a[3] + b[3]
a[4] = a[4]
a[5] = a[5]
a[6] = a[6]
a[7] = a[7]

If you really want to access one element you can cast a __m512d type to a double pointer (__m512d is a union).

__m512d a;

// print last element
std::cout << ((double*)&a)[0] << std::endl;

However, you shouldn't do that in computationally intensive parts of your program.

Keegan_S_ · ‎06-10-2014

So I'm trying to work with things without the union and I'm getting a seg. fault from the _m512_set_pd function. The code I have is as follows:

In the header

private:
     __m512d matrix[5];

and in the .cpp file

Matrix::Matrix(double d0, double d1, double d2, double d3,
                       double d4, double d5, double d6, double d7, ... more doubles){
     matrix[0] = _mm512_set_pd(d0,d1,d2,d3,d4,d5,d6,d7);  // line where seg. fault occurs
     // try to set values for the other 4 indices but the code doesn't get this far
}

I'm not sure why this would give me a seg. fault. Isn't __m512d 64 byte aligned by definition?

jimdempseyatthecove · ‎06-10-2014

Insert a sanity test to assert that things you think are aligned are aligned. And if they aren't figure out why and how to fix.

Also: Matrix Matrix::Multiply

May end up returning value on stack (which may not be aligned)

Jim Dempsey

Patrick_S_ · ‎06-10-2014

The constructor should work like you posted it.

Matrix::Matrix(double d0, double d1, double d2, double d3,
                       double d4, double d5, double d6, double d7) {

     matrix[0] = _mm512_set_pd(d0,d1,d2,d3,d4,d5,d6,d7);  
}

I guess you have a problem within your class implementation. Can you post a program that I can compile.

And yes your are correct with the alignment of a __m512d type. It is always aligned and should never cause e seg fault. Even as a return value. Here is the corresponding intel implementation

#ifdef __INTEL_CLANG_COMPILER

typedef float   __m512  __attribute__((__vector_size__(64)));
typedef double  __m512d __attribute__((__vector_size__(64)));
typedef __int64 __m512i __attribute__((__vector_size__(64)));

#else
#if !defined(__INTEL_COMPILER) && defined(_MSC_VER)
# define _MM512INTRIN_TYPE(X) __declspec(intrin_type)
#else
# define _MM512INTRIN_TYPE(X) _MMINTRIN_TYPE(X)
#endif


typedef union _MM512INTRIN_TYPE(64) __m512 {
    float       __m512_f32[16];
} __m512;

typedef union _MM512INTRIN_TYPE(64) __m512d {
    double      __m512d_f64[8];
} __m512d;

typedef union _MM512INTRIN_TYPE(64) __m512i {
    int         __m512i_i32[16];
} __m512i;

#endif /* __INTEL_CLANG_COMPILER */

Keegan_S_ · ‎06-11-2014

Here are the constructors and some test code I have. I compiled using

icpc -g -mmic -o Test Test.cpp Matrix.cpp

Let me know how it goes for you.

jimdempseyatthecove · ‎06-11-2014

I think you will find:

Matrix* m1 = new Matrix(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);

did not return a 64-byt aligned object.

You could consider using "placement new".

Matrix* m1 = new(_mm_malloc(sizeof(Matrix), CACHE_LINE_SIZE)) Matrix;

...

_mm_free(m1);

Caution, do not use delete on object allocated with _mm_malloc. You could also overload new for objects of Matrix type.

The above did not test for allocation failure.

Jim Dempsey

Patrick_S_ · ‎06-11-2014

jim is correct.

Matrix* m1 = new(_mm_malloc( sizeof(Matrix), 64 ))Matrix(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);

cout << "Matrix m1 initialized!" << endl;

_mm_free(m1);

this works. I tested it. However I'm not sure what happens when you allocate a __m512 type on the heap and then access it via your class interface. Does someone know that?

I would recommend to allocate your matrix array like that:

double *m = _mm_malloc( 8*5 * sizeof(double), 64 );

and when ever you want to access those elements use

__m512d v = _mm512_load_pd( m ) // loads the first 8 doubles
__m512d w = _mm512_load_pd( m + 8 ) // loads the next 8 doubles

Keegan_S_ · ‎06-11-2014

Yep that fixed it!

Thanks for the help!