- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am porting my code to DPC++ but I have run into a problem. I have narrowed down the problem to this unit test.
#include <CL/sycl.hpp>
#include <array>
#include <iostream>
#if FPGA || FPGA_EMULATOR
#include <CL/sycl/INTEL/fpga_extensions.hpp>
#endif
using namespace sycl;
#define M 4
#define N 5
#define M_LEN (M + 2)
#define N_LEN (N + 2)
#define DOMAIN_SIZE M_LEN*N_LEN
#define DIM 1
void VecAdd(queue &q, range<DIM> R, const int a[DOMAIN_SIZE], const int b[DOMAIN_SIZE], int sum[DOMAIN_SIZE]) {
auto e = q.parallel_for(R, [=](auto i) {
sum[i] = a[i] + b[i];
});
e.wait();
}
int main() {
auto R = range<1>{DOMAIN_SIZE};
default_selector d_selector;
queue q(d_selector);
std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;
int **u = malloc_shared<int *>(3*DOMAIN_SIZE, q);
int **v = malloc_shared<int *>(3*DOMAIN_SIZE, q);
int **p = malloc_shared<int *>(3*DOMAIN_SIZE, q);
int u_[3][DOMAIN_SIZE]; int *_u_[3] = {u_[0], u_[1], u_[2]}; u = _u_;
int v_[3][DOMAIN_SIZE]; int *_v_[3] = {v_[0], v_[1], v_[2]}; v = _v_;
int p_[3][DOMAIN_SIZE]; int *_p_[3] = {p_[0], p_[1], p_[2]}; p = _p_;
auto e = q.parallel_for(R, [=](auto i) {
u[0][i] = i;
v[0][i] = 2*i;
});
VecAdd(q, R, u[0], v[0], p[0]);
for (int i=0; i<DOMAIN_SIZE; i++)
std::cout << "p[0][" << i << "] = " << p[0][i] << std::endl;
free(u, q);
free(v, q);
free(p, q);
return 0;
}
This code compiles but throws the following error:
terminate called after throwing an instance of 'cl::sycl::runtime_error'
what(): Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)
Aborted
As discussed previously here I decided to change my buffer model to USM. So, this kind of array declaration has been tested and had been working fine with the buffer model. Moreover, this code gives me a correct output on CPU while giving the same error.
I don't understand what I am doing wrong here and what the error says.
Could you please help me with this?
Thanks,
Leila
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The main cause of your error is the way you are allocating memory. Dynamic allocation uses Heap memory where as static allocation uses stack memory, You are trying to merge both methods.
Instead of this >>int u_[3][DOMAIN_SIZE]; int *_u_[3] = {u_[0], u_[1], u_[2]}; u = _u_; you can use this line >> u[0] = malloc_shared<int>(DOMAIN_SIZE, q);
We need to use e.wait(); after every parallel_for loop as this synchronizes the data before we proceed to any other operation on data.
>> I don't know where to look up the versions.
You can check version by using compiler --version command ex: dpcpp --version
If you have small input size, you can create 1D pointers and can traverse through row*array_width+column.
You can find below complete snippet:
#include <CL/sycl.hpp>
#include <array>
#include <iostream>
#if FPGA || FPGA_EMULATOR
#include <CL/sycl/INTEL/fpga_extensions.hpp>
#endif
using namespace sycl;
#define M 4
#define N 5
#define M_LEN (M + 2)
#define N_LEN (N + 2)
constexpr size_t DOMAIN_SIZE = M_LEN*N_LEN;
#define DIM 1
void VecAdd(queue &q,size_t size, const int a[DOMAIN_SIZE], const int b[DOMAIN_SIZE], int sum[DOMAIN_SIZE]) {
range<1> num_items{size};
auto e = q.parallel_for(num_items, [=](auto i) {
sum[i] = a[i] + b[i];
});
e.wait();
}
int main() {
auto R = range<1>{DOMAIN_SIZE};
default_selector d_selector;
queue q(d_selector);
std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;
int **u = malloc_shared<int *>(DOMAIN_SIZE, q);
int **v = malloc_shared<int *>(DOMAIN_SIZE, q);
int **p = malloc_shared<int *>(DOMAIN_SIZE, q);
for(int i=0;i<3;i++) {
u[i] = malloc_shared<int>(DOMAIN_SIZE, q);
v[i] = malloc_shared<int>(DOMAIN_SIZE, q);
p[i] = malloc_shared<int>(DOMAIN_SIZE, q);
}
auto e=q.parallel_for(R, [=](auto i) {
u[0][i] = i;
v[0][i] = 2*i;
});
e.wait();
VecAdd(q, DOMAIN_SIZE, u[0], v[0], p[0]);
for (int i=0; i<DOMAIN_SIZE; i++)
std::cout << "p[0][" << i << "] = " << p[0][i] << std::endl;
free(u,q);
free(v,q);
free(p,q);
return 0;
}
Let us know if it helps.
Thanks & Regards
Noorjahan
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
We are also able to reproduce the same issue on our end.
We are looking into your issue internally. We will get back to you soon.
Meanwhile, could you please provide the following environment details
Compiler version
OS & it's version.
Thanks & Regards
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for looking into this.
I am running the code on Inter DevCloud. I don't know where to look up the versions.
Thanks,
Leila
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The main cause of your error is the way you are allocating memory. Dynamic allocation uses Heap memory where as static allocation uses stack memory, You are trying to merge both methods.
Instead of this >>int u_[3][DOMAIN_SIZE]; int *_u_[3] = {u_[0], u_[1], u_[2]}; u = _u_; you can use this line >> u[0] = malloc_shared<int>(DOMAIN_SIZE, q);
We need to use e.wait(); after every parallel_for loop as this synchronizes the data before we proceed to any other operation on data.
>> I don't know where to look up the versions.
You can check version by using compiler --version command ex: dpcpp --version
If you have small input size, you can create 1D pointers and can traverse through row*array_width+column.
You can find below complete snippet:
#include <CL/sycl.hpp>
#include <array>
#include <iostream>
#if FPGA || FPGA_EMULATOR
#include <CL/sycl/INTEL/fpga_extensions.hpp>
#endif
using namespace sycl;
#define M 4
#define N 5
#define M_LEN (M + 2)
#define N_LEN (N + 2)
constexpr size_t DOMAIN_SIZE = M_LEN*N_LEN;
#define DIM 1
void VecAdd(queue &q,size_t size, const int a[DOMAIN_SIZE], const int b[DOMAIN_SIZE], int sum[DOMAIN_SIZE]) {
range<1> num_items{size};
auto e = q.parallel_for(num_items, [=](auto i) {
sum[i] = a[i] + b[i];
});
e.wait();
}
int main() {
auto R = range<1>{DOMAIN_SIZE};
default_selector d_selector;
queue q(d_selector);
std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;
int **u = malloc_shared<int *>(DOMAIN_SIZE, q);
int **v = malloc_shared<int *>(DOMAIN_SIZE, q);
int **p = malloc_shared<int *>(DOMAIN_SIZE, q);
for(int i=0;i<3;i++) {
u[i] = malloc_shared<int>(DOMAIN_SIZE, q);
v[i] = malloc_shared<int>(DOMAIN_SIZE, q);
p[i] = malloc_shared<int>(DOMAIN_SIZE, q);
}
auto e=q.parallel_for(R, [=](auto i) {
u[0][i] = i;
v[0][i] = 2*i;
});
e.wait();
VecAdd(q, DOMAIN_SIZE, u[0], v[0], p[0]);
for (int i=0; i<DOMAIN_SIZE; i++)
std::cout << "p[0][" << i << "] = " << p[0][i] << std::endl;
free(u,q);
free(v,q);
free(p,q);
return 0;
}
Let us know if it helps.
Thanks & Regards
Noorjahan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Noorjahan,
Thank you for taking the time and debugging my code. It did resolve the issue.
All the best,
Leila
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for accepting as a solution.
As this issue has been resolved, we will no longer respond to this thread.
If you require any additional assistance from Intel, please start a new thread.
Thanks & Regards
Noorjahan.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page