- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm an undergrad student working on some code for that is supposed to just do some simple matrix multiplication. The professor I'm working for has purchased a MIC and I'm trying to get things working using some of the C++ intrinsics. In particular I'm using _m512_mul_pd to try to multiply two together and I'm storing the result in another vector. However, when ever I have any code that accesses the variable I use to store the result of the multiplication in I get a seg. fault. Any ideas of why this is happening and what I can do to fix it?
Here are the lines of code I'm talking about:
__m512d tmp = _m512_mul_pd(matrix[0].v, m.matrix[0].v) //works fine
return Matrix(tmp) // causes seg fault
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is your data aligned on 64 byte boundary?
I think you should post more of your code. It is unclear for me what "return Matrix(tmp)" does.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the header file I have a union defined as follows which I then use to hold the matrix data such that its 64 byte aligned.
union dvec{ __m512d v; double d[8]; };
union dvec matrix[5] __attribute__((align(64)));
I then have a constructor that takes doubles and another that takes a __m512d data type to create a Matrix object, thus what Matrix(tmp) should be doing in my first post.
In the function I'm having the seg. fault in I take a single matrix object as a parameter and then then try to multiply the two together.
Matrix Matrix::Multiply(const Matrix& m){ union dvec tmp __attribute__((align(64))); tmp = _mm512_setzero_pd(); tmp = _mm512_mul_pd(matrix[0].v, m.matrix[0].v); return Matrix(tmp); }
I realize that as I have it written now it doesn't actually multiply the matrices together by the standard definition of matrix multiplication. This is more of just a test at the moment to make sure I can get the multiply intrinsic to work correctly. Also the reason I declare matrix to be an array of 5 union dvec types is so that I can work with larger matrices than those that just hold 8 elements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you should try
__declspec(align(64)) union dvec { __m512d v; double d[8]; };
However, I think you shouldn't use unions. There are betters ways to do this e.g.
__m512d m[8];
as a private member of the class. Then the matrix double constructor has to use
__m512d m[0] = _mm512_set_pd( 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 );
Referring to your "Multiply" method:
tmp = _mm512_setzero_pd();
is not needed. Also, the union tmp is only one __m512d type. I guess your constructor needs more then one __m512d as input?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
btw you should have a look into a vector class library e.g.
this one http://www.agner.org/optimize/vectorclass.zip
class Vec8f is located in vectorf256.h. Keep in mind this is a AVX library. You can use it only as an inspiration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I get rid of the union how do I access individual elements of the vector for printing or other such things?
Sorry if knowing that should be a basic thing. I'm still new to this and the only example code I have to go off the guy used unions to access the individual elements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should avoid to access individual elements of a vector register. This is a very expensive operation, because there is no corresponding assembler instruction. You should always use masked vector operations e.g.
add two __m512d but only the first 4 elements
__m512d a, b; a = _mm512_mask_add_pd( a, 0x0F, a, b); // equivalent C operation double a[8], b[8]; a[0] = a[0] + b[0] a[1] = a[1] + b[1] a[2] = a[2] + b[2] a[3] = a[3] + b[3] a[4] = a[4] a[5] = a[5] a[6] = a[6] a[7] = a[7]
If you really want to access one element you can cast a __m512d type to a double pointer (__m512d is a union).
__m512d a; // print last element std::cout << ((double*)&a)[0] << std::endl;
However, you shouldn't do that in computationally intensive parts of your program.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So I'm trying to work with things without the union and I'm getting a seg. fault from the _m512_set_pd function. The code I have is as follows:
In the header
private: __m512d matrix[5];
and in the .cpp file
Matrix::Matrix(double d0, double d1, double d2, double d3, double d4, double d5, double d6, double d7, ... more doubles){ matrix[0] = _mm512_set_pd(d0,d1,d2,d3,d4,d5,d6,d7); // line where seg. fault occurs // try to set values for the other 4 indices but the code doesn't get this far }
I'm not sure why this would give me a seg. fault. Isn't __m512d 64 byte aligned by definition?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Insert a sanity test to assert that things you think are aligned are aligned. And if they aren't figure out why and how to fix.
Also: Matrix Matrix::Multiply
May end up returning value on stack (which may not be aligned)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The constructor should work like you posted it.
Matrix::Matrix(double d0, double d1, double d2, double d3, double d4, double d5, double d6, double d7) { matrix[0] = _mm512_set_pd(d0,d1,d2,d3,d4,d5,d6,d7); }
I guess you have a problem within your class implementation. Can you post a program that I can compile.
And yes your are correct with the alignment of a __m512d type. It is always aligned and should never cause e seg fault. Even as a return value. Here is the corresponding intel implementation
#ifdef __INTEL_CLANG_COMPILER typedef float __m512 __attribute__((__vector_size__(64))); typedef double __m512d __attribute__((__vector_size__(64))); typedef __int64 __m512i __attribute__((__vector_size__(64))); #else #if !defined(__INTEL_COMPILER) && defined(_MSC_VER) # define _MM512INTRIN_TYPE(X) __declspec(intrin_type) #else # define _MM512INTRIN_TYPE(X) _MMINTRIN_TYPE(X) #endif typedef union _MM512INTRIN_TYPE(64) __m512 { float __m512_f32[16]; } __m512; typedef union _MM512INTRIN_TYPE(64) __m512d { double __m512d_f64[8]; } __m512d; typedef union _MM512INTRIN_TYPE(64) __m512i { int __m512i_i32[16]; } __m512i; #endif /* __INTEL_CLANG_COMPILER */
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think you will find:
Matrix* m1 = new Matrix(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);
did not return a 64-byt aligned object.
You could consider using "placement new".
Matrix* m1 = new(_mm_malloc(sizeof(Matrix), CACHE_LINE_SIZE)) Matrix;
...
_mm_free(m1);
Caution, do not use delete on object allocated with _mm_malloc. You could also overload new for objects of Matrix type.
The above did not test for allocation failure.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jim is correct.
Matrix* m1 = new(_mm_malloc( sizeof(Matrix), 64 ))Matrix(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);
cout << "Matrix m1 initialized!" << endl;
_mm_free(m1);
this works. I tested it. However I'm not sure what happens when you allocate a __m512 type on the heap and then access it via your class interface. Does someone know that?
I would recommend to allocate your matrix array like that:
double *m = _mm_malloc( 8*5 * sizeof(double), 64 );
and when ever you want to access those elements use
__m512d v = _mm512_load_pd( m ) // loads the first 8 doubles __m512d w = _mm512_load_pd( m + 8 ) // loads the next 8 doubles
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep that fixed it!
Thanks for the help!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page