Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 Discussions

Seg. fault when using _m512_mul_pd

Keegan_S_
Beginner
3,236 Views

I'm an undergrad student working on some code for that is supposed to just do some simple matrix multiplication. The professor I'm working for has purchased a MIC and I'm trying to get things working using some of the C++ intrinsics. In particular I'm using _m512_mul_pd to try to multiply two  together and I'm storing the result in another vector. However, when ever I have any code that accesses the variable I use to store the result of the multiplication in I get a seg. fault. Any ideas of why this is happening and what I can do to fix it?

Here are the lines of code I'm talking about:

__m512d tmp = _m512_mul_pd(matrix[0].v, m.matrix[0].v)  //works fine

return Matrix(tmp) // causes seg fault

0 Kudos
13 Replies
Patrick_S_
New Contributor I
3,236 Views

is your data aligned on 64 byte boundary?

I think you should post more of your code. It is unclear for me what "return Matrix(tmp)" does.

0 Kudos
Keegan_S_
Beginner
3,236 Views

In the header file I have a union defined as follows which I then use to hold the matrix data such that its 64 byte aligned.

union dvec{
     __m512d v;
     double d[8];
};
union dvec matrix[5] __attribute__((align(64)));

I then have a constructor that takes doubles and another that takes a __m512d data type to create a Matrix object, thus what Matrix(tmp) should be doing in my first post.

In the function I'm having the seg. fault in I take a single matrix object as a parameter and then then try to multiply the two together.

Matrix Matrix::Multiply(const Matrix& m){
     union dvec tmp __attribute__((align(64)));
     tmp = _mm512_setzero_pd();
     tmp = _mm512_mul_pd(matrix[0].v, m.matrix[0].v);
     return Matrix(tmp);
}

I realize that as I have it written now it doesn't actually multiply the matrices together by the standard definition of matrix multiplication. This is more of just a test at the moment to make sure I can get the multiply intrinsic to work correctly. Also the reason I declare matrix to be an array of 5 union dvec types is so that I can work with larger matrices than those that just hold 8 elements.

0 Kudos
Patrick_S_
New Contributor I
3,236 Views

you should try

__declspec(align(64)) union dvec {

        __m512d v;
        double d[8];
};

 

However, I think you shouldn't use unions. There are betters ways to do this e.g. 

__m512d m[8];

as a private member of the class. Then the matrix double constructor has to use

 __m512d m[0] = _mm512_set_pd( 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 );

Referring to your "Multiply" method:

tmp = _mm512_setzero_pd();

is not needed. Also, the union tmp is only one __m512d type. I guess your constructor needs more then one __m512d as input?

0 Kudos
Patrick_S_
New Contributor I
3,236 Views

btw you should have a look into a vector class library e.g.

this one http://www.agner.org/optimize/vectorclass.zip

class Vec8f is located in vectorf256.h. Keep in mind this is a AVX library. You can use it only as an inspiration. 

0 Kudos
Keegan_S_
Beginner
3,236 Views

If I get rid of the union how do I access individual elements of the vector for printing or other such things?

Sorry if knowing that should be a basic thing. I'm still new to this and the only example code I have to go off the guy used unions to access the individual elements.

0 Kudos
Patrick_S_
New Contributor I
3,236 Views

You should avoid to access individual elements of a vector register. This is a very expensive operation, because there is no corresponding assembler instruction. You should always use masked vector operations e.g.

add two __m512d but only the first 4 elements

__m512d a, b;

a = _mm512_mask_add_pd( a, 0x0F, a, b);


// equivalent C operation

double a[8], b[8];

a[0] = a[0] + b[0]
a[1] = a[1] + b[1]
a[2] = a[2] + b[2]
a[3] = a[3] + b[3]
a[4] = a[4]
a[5] = a[5]
a[6] = a[6]
a[7] = a[7]

If you really want to access one element you can cast a __m512d type to a double pointer (__m512d is a union).

__m512d a;

// print last element
std::cout << ((double*)&a)[0] << std::endl;

However, you shouldn't do that in computationally intensive parts of your program.

0 Kudos
Keegan_S_
Beginner
3,236 Views

So I'm trying to work with things without the union and I'm getting a seg. fault from the _m512_set_pd function. The code I have is as follows:

In the header

private:
     __m512d matrix[5];

and in the .cpp file

Matrix::Matrix(double d0, double d1, double d2, double d3,
                       double d4, double d5, double d6, double d7, ... more doubles){
     matrix[0] = _mm512_set_pd(d0,d1,d2,d3,d4,d5,d6,d7);  // line where seg. fault occurs
     // try to set values for the other 4 indices but the code doesn't get this far
}

I'm not sure why this would give me a seg. fault. Isn't __m512d 64 byte aligned by definition?

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,236 Views

Insert a sanity test to assert that things you think are aligned are aligned. And if they aren't figure out why and how to fix.

Also: Matrix Matrix::Multiply

May end up returning value on stack (which may not be aligned)

Jim Dempsey

0 Kudos
Patrick_S_
New Contributor I
3,236 Views

The constructor should work like you posted it.

Matrix::Matrix(double d0, double d1, double d2, double d3,
                       double d4, double d5, double d6, double d7) {

     matrix[0] = _mm512_set_pd(d0,d1,d2,d3,d4,d5,d6,d7);  
}

I guess you have a problem within your class implementation. Can you post a program that I can compile.

 

And yes your are correct with the alignment of a __m512d type. It is always aligned and should never cause e seg fault. Even as a return value. Here is the corresponding intel implementation

#ifdef __INTEL_CLANG_COMPILER

typedef float   __m512  __attribute__((__vector_size__(64)));
typedef double  __m512d __attribute__((__vector_size__(64)));
typedef __int64 __m512i __attribute__((__vector_size__(64)));

#else
#if !defined(__INTEL_COMPILER) && defined(_MSC_VER)
# define _MM512INTRIN_TYPE(X) __declspec(intrin_type)
#else
# define _MM512INTRIN_TYPE(X) _MMINTRIN_TYPE(X)
#endif


typedef union _MM512INTRIN_TYPE(64) __m512 {
    float       __m512_f32[16];
} __m512;

typedef union _MM512INTRIN_TYPE(64) __m512d {
    double      __m512d_f64[8];
} __m512d;

typedef union _MM512INTRIN_TYPE(64) __m512i {
    int         __m512i_i32[16];
} __m512i;

#endif /* __INTEL_CLANG_COMPILER */

 

0 Kudos
Keegan_S_
Beginner
3,236 Views

Here are the constructors and some test code I have. I compiled using

icpc -g -mmic -o Test Test.cpp Matrix.cpp

Let me know how it goes for you.

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,236 Views

I think you will find:

Matrix* m1 = new Matrix(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);

did not return a 64-byt aligned object.

You could consider using "placement new".

Matrix* m1 = new(_mm_malloc(sizeof(Matrix), CACHE_LINE_SIZE)) Matrix;

...

_mm_free(m1);

Caution, do not use delete on object allocated with _mm_malloc. You could also overload new for objects of Matrix type.

The above did not test for allocation failure.

Jim Dempsey

 

 

0 Kudos
Patrick_S_
New Contributor I
3,236 Views

jim is correct.

Matrix* m1 = new(_mm_malloc( sizeof(Matrix), 64 ))Matrix(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);

cout << "Matrix m1 initialized!" << endl;

_mm_free(m1);

this works. I tested it. However I'm not sure what happens when you allocate a __m512 type on the heap and then access it via your class interface. Does someone know that?

I would recommend to allocate your matrix array like that:

double *m = _mm_malloc( 8*5 * sizeof(double), 64 );

and when ever you want to access those elements use 

__m512d v = _mm512_load_pd( m ) // loads the first 8 doubles
__m512d w = _mm512_load_pd( m + 8 ) // loads the next 8 doubles

 

0 Kudos
Keegan_S_
Beginner
3,236 Views

Yep that fixed it!

Thanks for the help!

0 Kudos
Reply