- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have this simple matrix multiply for offload on Phi, but I get offload error (SIGSEGV) when I run the program below:
#include <stdlib.h>
#include <math.h>
void main()
{
double *a, *b, *c;
int i,j,k, ok, n=100;
// allocated memory on the heap aligned to 64 byte boundary
ok = posix_memalign((void**)&a, 64, n*n*sizeof(double));
ok |= posix_memalign((void**)&b, 64, n*n*sizeof(double));
ok |= posix_memalign((void**)&c, 64, n*n*sizeof(double));
// initialize matrices
for(i=0; i<n; i++)
{
a = (int) rand();
b = (int) rand();
c = 0.0;
}
//offload code
#pragma offload target(mic) in(a,b:length(n*n)) inout(c:length(n*n))
//parallelize via OpenMP on MIC
#pragma omp parallel for
for( i = 0; i < n; i++ )
for( k = 0; k < n; k++ )
#pragma vector aligned
#pragma ivdep
for( j = 0; j < n; j++ )
//c
c[i*n+j] = c[i*n+j] + a[i*n+k]*b[k*n+j];
}
What am I doing wrong?
I read a previous post that there might be a known bug in the release?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dave,
in order to fix your code you can do something like below.
A nice paper about it is http://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization ; . A comprehensive resource with practical examples that addresses vectorization, data alignment and optimization on Xeon Phi in general is http://www.colfax-intl.com/nd/xeonphi/book.aspx . Of course, asking me about resources is like asking Ronald McDonald to point out a good burger place in town.
Andrey
#include <stdlib.h>
#include <math.h>
void main()
{
double *a, *b, *c;
int i,j,k, ok, n=100;
int nPadded = ( n%8 == 0 ? n : n + (8-n%8) );
// allocated memory on the heap aligned to 64 byte boundary
ok = posix_memalign((void**)&a, 64, n*nPadded*sizeof(double));
ok |= posix_memalign((void**)&b, 64, n*nPadded*sizeof(double));
ok |= posix_memalign((void**)&c, 64, n*nPadded*sizeof(double));
// initialize matrices
for(i=0; i<n; i++)
{
a = (int) rand();
b = (int) rand();
c = 0.0;
}
//offload code
#pragma offload target(mic) in(a,b:length(n*nPadded)) inout(c:length(n*nPadded))
//parallelize via OpenMP on MIC
#pragma omp parallel for
for( i = 0; i < n; i++ )
for( k = 0; k < n; k++ )
#pragma vector aligned
#pragma ivdep
for( j = 0; j < n; j++ )
//c
c[i*nPadded+j] = c[i*nPadded+j] + a[i*nPadded+k]*b[k*nPadded+j];
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is the program output:
[Offload] [MIC 0] [File] matmul_offload.cpp
[Offload] [MIC 0] [Line] 19
[Offload] [MIC 0] [Tag] Tag 0
offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> c[i*n+j] = c[i*n+j] + a[i*n+k]*b[k*n+j];
n = 100
When i=1 and j=0 (start of inner loop) then c[i*n+j] is not aligned as you have so stated with #pragma vector aligned. Do not make false declarations.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you work with "#pragma vector aligned" on Xeon Phi, then, in addition to using an aligned allocator, you have to pad the inner loop dimension (in your case, "n") to a multiple of 8 in double precision or a multiple of 16 in single precision. Otherwise, as Jim Dempsey explained above, your declaration becomes false for i>0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Andrey - its my first Offload code for xeon phi. I usually compile code for native runs.
Could you kindly give me an example, or point me to a resource?
Much thanks
Dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dave,
in order to fix your code you can do something like below.
A nice paper about it is http://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization ; . A comprehensive resource with practical examples that addresses vectorization, data alignment and optimization on Xeon Phi in general is http://www.colfax-intl.com/nd/xeonphi/book.aspx . Of course, asking me about resources is like asking Ronald McDonald to point out a good burger place in town.
Andrey
#include <stdlib.h>
#include <math.h>
void main()
{
double *a, *b, *c;
int i,j,k, ok, n=100;
int nPadded = ( n%8 == 0 ? n : n + (8-n%8) );
// allocated memory on the heap aligned to 64 byte boundary
ok = posix_memalign((void**)&a, 64, n*nPadded*sizeof(double));
ok |= posix_memalign((void**)&b, 64, n*nPadded*sizeof(double));
ok |= posix_memalign((void**)&c, 64, n*nPadded*sizeof(double));
// initialize matrices
for(i=0; i<n; i++)
{
a = (int) rand();
b = (int) rand();
c = 0.0;
}
//offload code
#pragma offload target(mic) in(a,b:length(n*nPadded)) inout(c:length(n*nPadded))
//parallelize via OpenMP on MIC
#pragma omp parallel for
for( i = 0; i < n; i++ )
for( k = 0; k < n; k++ )
#pragma vector aligned
#pragma ivdep
for( j = 0; j < n; j++ )
//c
c[i*nPadded+j] = c[i*nPadded+j] + a[i*nPadded+k]*b[k*nPadded+j];
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using that code to know if Xeon Phi has bettter perfomance that only-Xeon. I commented the instructions #pragma offload target(mic) in(a,b:length(n*nPadded)) inout(c:length(n*nPadded)), #pragma vector aligned and #pragma ivdep for run on only-Xeon and uncommented that for run on Xeon-Phi but performance on only-Xeon is better than Xeon-phi, to complile I use icc -O3 -qopenmp matrixmatrix_mul.c -o matrixmatrix_mul.mic -mmic for Xeon-Phi and icc -O3 -qopenmp matrixmatrix_mul.c -o matrixmatrix_mul for only-Xeon. Please Could you help me with an simple example where using parallelization and vectorization Xeon-Phi performance is better than only-Xeon.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page