Solved: Many errors while SYCL-code compilation for FPGA on local system

Vladislav-Butko-bvo · ‎04-04-2023

I have installed on my local system (with OS Windows) oneAPI Base Toolkit. Compilation was executed in Intel oneAPI command prompt for IA32 for Visual Studio 2022.

Compilation flow was provided considered to FPGA Flow: https://www.intel.com/content/dam/develop/external/us/en/documents/oneapiprogrammingguide-8.pdf (page 32-33).

I stucked in the start of flow (with command "dpcpp -fintelfpga <source_file>.cpp"):

Source code of compiling file (oneAPI sample "vector-add-buffers.cpp"):

//==============================================================
// Vector Add is the equivalent of a Hello, World! sample for data parallel
// programs. Building and running the sample verifies that your development
// environment is setup correctly and demonstrates the use of the core features
// of SYCL. This sample runs on both CPU and GPU (or FPGA). When run, it
// computes on both the CPU and offload device, then compares results. If the
// code executes on both CPU and offload device, the device name and a success
// message are displayed. And, your development environment is setup correctly!
//
// For comprehensive instructions regarding SYCL Programming, go to
// https://software.intel.com/en-us/oneapi-programming-guide and search based on
// relevant terms noted in the comments.
//
// SYCL material used in the code sample:
// • A one dimensional array of data.
// • A device queue, buffer, accessor, and kernel.
//==============================================================
// Copyright © Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================
#include <sycl/sycl.hpp>
#include <vector>
#include <iostream>
#include <string>
#if FPGA || FPGA_EMULATOR
#include <sycl/ext/intel/fpga_extensions.hpp>
#endif

using namespace sycl;

// num_repetitions: How many times to repeat the kernel invocation
size_t num_repetitions = 1;
// Vector type and data size for this example.
size_t vector_size = 10000;
typedef std::vector<int> IntVector;

// Create an exception handler for asynchronous SYCL exceptions
static auto exception_handler = [](sycl::exception_list e_list) {
for (std::exception_ptr const &e : e_list) {
try {
std::rethrow_exception(e);
}
catch (std::exception const &e) {
#if _DEBUG
std::cout << "Failure" << std::endl;
#endif
std::terminate();
}
}
};

//************************************
// Vector add in SYCL on device: returns sum in 4th parameter "sum_parallel".
//************************************
void VectorAdd(queue &q, const IntVector &a_vector, const IntVector &b_vector,
IntVector &sum_parallel) {
// Create the range object for the vectors managed by the buffer.
range<1> num_items{a_vector.size()};

// Create buffers that hold the data shared between the host and the devices.
// The buffer destructor is responsible to copy the data back to host when it
// goes out of scope.
buffer a_buf(a_vector);
buffer b_buf(b_vector);
buffer sum_buf(sum_parallel.data(), num_items);

for (size_t i = 0; i < num_repetitions; i++ ) {

// Submit a command group to the queue by a lambda function that contains the
// data access permission and device computation (kernel).
q.submit([&](handler &h) {
// Create an accessor for each buffer with access permission: read, write or
// read/write. The accessor is a mean to access the memory in the buffer.
accessor a(a_buf, h, read_only);
accessor b(b_buf, h, read_only);

// The sum_accessor is used to store (with write permission) the sum data.
accessor sum(sum_buf, h, write_only, no_init);

// Use parallel_for to run vector addition in parallel on device. This
// executes the kernel.
// 1st parameter is the number of work items.
// 2nd parameter is the kernel, a lambda that specifies what to do per
// work item. The parameter of the lambda is the work item id.
// SYCL supports unnamed lambda kernel by default.
h.parallel_for(num_items, [=](auto i) { sum[i] = a[i] + b[i]; });
});
};
// Wait until compute tasks on GPU done
q.wait();
}

//************************************
// Initialize the vector from 0 to vector_size - 1
//************************************
void InitializeVector(IntVector &a) {
for (size_t i = 0; i < a.size(); i++) a.at(i) = i;
}

//************************************
// Demonstrate vector add both in sequential on CPU and in parallel on device.
//************************************
int main(int argc, char* argv[]) {
// Change num_repetitions if it was passed as argument
if (argc > 2) num_repetitions = std::stoi(argv[2]);
// Change vector_size if it was passed as argument
if (argc > 1) vector_size = std::stoi(argv[1]);
// Create device selector for the device of your interest.
#if FPGA_EMULATOR
// Intel extension: FPGA emulator selector on systems without FPGA card.
ext::intel::fpga_emulator_selector d_selector;
#elif FPGA
// Intel extension: FPGA selector on systems with FPGA card.
ext::intel::fpga_selector d_selector;
#else
// The default device selector will select the most performant device.
auto d_selector{default_selector_v};
#endif

// Create vector objects with "vector_size" to store the input and output data.
IntVector a, b, sum_sequential, sum_parallel;
a.resize(vector_size);
b.resize(vector_size);
sum_sequential.resize(vector_size);
sum_parallel.resize(vector_size);

// Initialize input vectors with values from 0 to vector_size - 1
InitializeVector(a);
InitializeVector(b);

try {
queue q(d_selector, exception_handler);

// Print out the device information used for the kernel code.
std::cout << "Running on device: "
<< q.get_device().get_info<info::device::name>() << "\n";
std::cout << "Vector size: " << a.size() << "\n";

// Vector addition in SYCL
VectorAdd(q, a, b, sum_parallel);
} catch (exception const &e) {
std::cout << "An exception is caught for vector add.\n";
std::terminate();
}

// Compute the sum of two vectors in sequential for validation.
for (size_t i = 0; i < sum_sequential.size(); i++)
sum_sequential.at(i) = a.at(i) + b.at(i);

// Verify that the two vectors are equal.
for (size_t i = 0; i < sum_sequential.size(); i++) {
if (sum_parallel.at(i) != sum_sequential.at(i)) {
std::cout << "Vector add failed on device.\n";
return -1;
}
}

int indices[]{0, 1, 2, (static_cast<int>(a.size()) - 1)};
constexpr size_t indices_size = sizeof(indices) / sizeof(int);

// Print out the result of vector add.
for (int i = 0; i < indices_size; i++) {
int j = indices[i];
if (i == indices_size - 1) std::cout << "...\n";
std::cout << "[" << j << "]: " << a[j] << " + " << b[j] << " = "
<< sum_parallel[j] << "\n";
}

a.clear();
b.clear();
sum_sequential.clear();
sum_parallel.clear();

std::cout << "Vector add successfully completed on device.\n";
return 0;
}

Compilation flow and part of output with many errors:

Part of output in the end:

BoonBengT_Intel · ‎04-05-2023

Hi @Vladislav-Butko-bvo,

Thank you for posting in Intel community forum and hope all is well.

Noted on the inconvenient faced, based on the explanation it seems that there are two possibilities of issues either on the host machine that is installation oneAPI or the sample codes that are causing the error. Leaning more toward the host machine, may I know where did you get the sample code from?

Would suggest perhaps to try and build the same sample code in our Intel Devcloud, instruction as the link below:

- https://devcloud.intel.com/oneapi/get_started/

If codes are building and running, this would rule out the issues on codes.

As for the host machine installation, would recommend to refer to the getting started guide below:

- https://www.intel.com/content/www/us/en/docs/oneapi-base-toolkit/get-started-guide-windows/2023-1/overview.html

Assuming you are using the CLI to run the sample, suspecting that maybe the system variable are not set or corrupted.

Hope that clarify

Best Wishes

BB

View solution in original post

BoonBengT_Intel · ‎04-05-2023

Hi @Vladislav-Butko-bvo,

Thank you for posting in Intel community forum and hope all is well.

Noted on the inconvenient faced, based on the explanation it seems that there are two possibilities of issues either on the host machine that is installation oneAPI or the sample codes that are causing the error. Leaning more toward the host machine, may I know where did you get the sample code from?

Would suggest perhaps to try and build the same sample code in our Intel Devcloud, instruction as the link below:

- https://devcloud.intel.com/oneapi/get_started/

If codes are building and running, this would rule out the issues on codes.

As for the host machine installation, would recommend to refer to the getting started guide below:

- https://www.intel.com/content/www/us/en/docs/oneapi-base-toolkit/get-started-guide-windows/2023-1/overview.html

Assuming you are using the CLI to run the sample, suspecting that maybe the system variable are not set or corrupted.

Hope that clarify

Best Wishes

BB

Vladislav-Butko-bvo · ‎04-07-2023

It's more likely that "the system variable are not set" because sample code are compiled successfully on DevCloud. But I didn't test this guess and do migrate on DevCloud.

BoonBengT_Intel · ‎04-10-2023

Hi @Vladislav-Butko-bvo,

Assume that you have managed to find the solution on system variable not set as mention, with no further clarification on this thread, it will be transitioned to community support for further help on doubts in this thread. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support.

Thank you for the questions and as always pleasure having you here.

Best Wishes

BB