- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm an undergrad student working on some code for that is supposed to just do some simple matrix multiplication. The professor I'm working for has purchased a MIC and I'm trying to get things working using some of the C++ intrinsics. In particular I'm using _m512_mul_pd to try to multiply two together and I'm storing the result in another vector. However, when ever I have any code that accesses the variable I use to store the result of the multiplication in I get a seg. fault. Any ideas of why this is happening and what I can do to fix it?
Here are the lines of code I'm talking about:
__m512d tmp = _m512_mul_pd(matrix[0].v, m.matrix[0].v) //works fine
return Matrix(tmp) // causes seg fault
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is your data aligned on 64 byte boundary?
I think you should post more of your code. It is unclear for me what "return Matrix(tmp)" does.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the header file I have a union defined as follows which I then use to hold the matrix data such that its 64 byte aligned.
union dvec{
__m512d v;
double d[8];
};
union dvec matrix[5] __attribute__((align(64)));
I then have a constructor that takes doubles and another that takes a __m512d data type to create a Matrix object, thus what Matrix(tmp) should be doing in my first post.
In the function I'm having the seg. fault in I take a single matrix object as a parameter and then then try to multiply the two together.
Matrix Matrix::Multiply(const Matrix& m){
union dvec tmp __attribute__((align(64)));
tmp = _mm512_setzero_pd();
tmp = _mm512_mul_pd(matrix[0].v, m.matrix[0].v);
return Matrix(tmp);
}
I realize that as I have it written now it doesn't actually multiply the matrices together by the standard definition of matrix multiplication. This is more of just a test at the moment to make sure I can get the multiply intrinsic to work correctly. Also the reason I declare matrix to be an array of 5 union dvec types is so that I can work with larger matrices than those that just hold 8 elements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you should try
__declspec(align(64)) union dvec {
__m512d v;
double d[8];
};
However, I think you shouldn't use unions. There are betters ways to do this e.g.
__m512d m[8];
as a private member of the class. Then the matrix double constructor has to use
__m512d m[0] = _mm512_set_pd( 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 );
Referring to your "Multiply" method:
tmp = _mm512_setzero_pd();
is not needed. Also, the union tmp is only one __m512d type. I guess your constructor needs more then one __m512d as input?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
btw you should have a look into a vector class library e.g.
this one http://www.agner.org/optimize/vectorclass.zip
class Vec8f is located in vectorf256.h. Keep in mind this is a AVX library. You can use it only as an inspiration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I get rid of the union how do I access individual elements of the vector for printing or other such things?
Sorry if knowing that should be a basic thing. I'm still new to this and the only example code I have to go off the guy used unions to access the individual elements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should avoid to access individual elements of a vector register. This is a very expensive operation, because there is no corresponding assembler instruction. You should always use masked vector operations e.g.
add two __m512d but only the first 4 elements
__m512d a, b; a = _mm512_mask_add_pd( a, 0x0F, a, b); // equivalent C operation double a[8], b[8]; a[0] = a[0] + b[0] a[1] = a[1] + b[1] a[2] = a[2] + b[2] a[3] = a[3] + b[3] a[4] = a[4] a[5] = a[5] a[6] = a[6] a[7] = a[7]
If you really want to access one element you can cast a __m512d type to a double pointer (__m512d is a union).
__m512d a; // print last element std::cout << ((double*)&a)[0] << std::endl;
However, you shouldn't do that in computationally intensive parts of your program.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So I'm trying to work with things without the union and I'm getting a seg. fault from the _m512_set_pd function. The code I have is as follows:
In the header
private:
__m512d matrix[5];
and in the .cpp file
Matrix::Matrix(double d0, double d1, double d2, double d3,
double d4, double d5, double d6, double d7, ... more doubles){
matrix[0] = _mm512_set_pd(d0,d1,d2,d3,d4,d5,d6,d7); // line where seg. fault occurs
// try to set values for the other 4 indices but the code doesn't get this far
}
I'm not sure why this would give me a seg. fault. Isn't __m512d 64 byte aligned by definition?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Insert a sanity test to assert that things you think are aligned are aligned. And if they aren't figure out why and how to fix.
Also: Matrix Matrix::Multiply
May end up returning value on stack (which may not be aligned)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The constructor should work like you posted it.
Matrix::Matrix(double d0, double d1, double d2, double d3,
double d4, double d5, double d6, double d7) {
matrix[0] = _mm512_set_pd(d0,d1,d2,d3,d4,d5,d6,d7);
}
I guess you have a problem within your class implementation. Can you post a program that I can compile.
And yes your are correct with the alignment of a __m512d type. It is always aligned and should never cause e seg fault. Even as a return value. Here is the corresponding intel implementation
#ifdef __INTEL_CLANG_COMPILER
typedef float __m512 __attribute__((__vector_size__(64)));
typedef double __m512d __attribute__((__vector_size__(64)));
typedef __int64 __m512i __attribute__((__vector_size__(64)));
#else
#if !defined(__INTEL_COMPILER) && defined(_MSC_VER)
# define _MM512INTRIN_TYPE(X) __declspec(intrin_type)
#else
# define _MM512INTRIN_TYPE(X) _MMINTRIN_TYPE(X)
#endif
typedef union _MM512INTRIN_TYPE(64) __m512 {
float __m512_f32[16];
} __m512;
typedef union _MM512INTRIN_TYPE(64) __m512d {
double __m512d_f64[8];
} __m512d;
typedef union _MM512INTRIN_TYPE(64) __m512i {
int __m512i_i32[16];
} __m512i;
#endif /* __INTEL_CLANG_COMPILER */
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think you will find:
Matrix* m1 = new Matrix(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);
did not return a 64-byt aligned object.
You could consider using "placement new".
Matrix* m1 = new(_mm_malloc(sizeof(Matrix), CACHE_LINE_SIZE)) Matrix;
...
_mm_free(m1);
Caution, do not use delete on object allocated with _mm_malloc. You could also overload new for objects of Matrix type.
The above did not test for allocation failure.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jim is correct.
Matrix* m1 = new(_mm_malloc( sizeof(Matrix), 64 ))Matrix(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);
cout << "Matrix m1 initialized!" << endl;
_mm_free(m1);
this works. I tested it. However I'm not sure what happens when you allocate a __m512 type on the heap and then access it via your class interface. Does someone know that?
I would recommend to allocate your matrix array like that:
double *m = _mm_malloc( 8*5 * sizeof(double), 64 );
and when ever you want to access those elements use
__m512d v = _mm512_load_pd( m ) // loads the first 8 doubles __m512d w = _mm512_load_pd( m + 8 ) // loads the next 8 doubles
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep that fixed it!
Thanks for the help!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page