Community
cancel
Showing results for 
Search instead for 
Did you mean: 
SimonWang
Beginner
83 Views

Matrix Multiplication via ND-Range and USM

 

// data allocation
int* w_device = malloc_device<int>(M * N, Q);
int* b_device = malloc_device<int>(M, Q);
int* temp = malloc_device<int>(SIZE, Q);
int* x_shared = malloc_shared<int>(N * K, Q);
int* result_shared = malloc_shared<int>(M * K, Q);

//two events: copy data to device
auto e1 = ... cgh.memcpy(w_device...)..;
auto e2 = ..cgh.memcpy(b_device...)...;

// compute graph dependencies
cgh.depends_on({e1, e2});

// two for-loops in kernel function
// dimension: nd_range<2>{{M, K}, {1, GROUP_SIZE}}
int m = item.get_global_id(0);
int n = item.get_global_id(1);
int i = item.get_local_id(1);
//first
for (int k = 0; k < N; k += GROUP_SIZE) {
  temp[i] = w_device[m][k +i];
  item.barrier();
  for(int kk = 0; kk < GROUP_SIZE; kk++) {
    sum += temp[kk] * x_shared[(k + kk) * K + n];
    item.barrier();
  }
}

//second
for (int k = 0; k < N; k++) {
  sum += w_device[m * N + k] * x_shared[k * K + n];
}
//
result[m * K + n] = sum + b_device[m];

 

I was trying to understand how to use SYCL, and decided to write matrix multiplication from scratch. However, I have some questions when testing my codes. I am not sure whether these problems were bugs, or I made some mistakes, so I posted the key part of my codes.

Using the first for-loop, the result occasionally gave wrong elements (not all, just part of it)., while the second for-loop will never give wrong result.

if I did not make mistakes, I guess there are some problems when using item.barrier() to synchronize data accessed by USM?

 

0 Kudos
3 Replies
RahulV_intel
Moderator
64 Views

Hi,


Could you please share the complete source code if possible?

Also, specify your oneAPI base toolkit version and OS.


Thanks,

Rahul


RahulV_intel
Moderator
51 Views

Hi @SimonWang,

 

Just a quick reminder to share your complete source code.

 

Thanks,

Rahul

 

RahulV_intel
Moderator
21 Views

Hi,


I have not heard back from you. So, I will go ahead and close this thread from my end. Feel free to post a new query if you require further assistance from Intel.


Thanks,

Rahul