Re: Re:Error: fpga compiler command failed with exit code 1 (use -v to see invocation)

ruma · ‎09-24-2024

Hello everyone!!

I have the next problem when i try to use the stratix 10 node 143, with the next line: make fpga.

I can emulated with the next line: make fpga_emu, and the program es correct, but I can´t use "make fpga" because exist error.

Can you help me please?

Thanks a lot.

HubertG · ‎09-25-2024

Hi Ruma,

I am sorry for the bad experience that you are facing. Before we proceed, can I have some of the details below ?

Operating System :

OS version :

Quartus (Lite/Standard/Pro) :

Quartus Version :

Possible if you could provide your design file for us to better debug on the issue ?

Best Regards

Hubert

ruma · ‎09-25-2024

Hello HubertG

Operating System : Ubuntu

OS version : 20.04.4 LTS

Quartus (Lite/Standard/Pro) : Standard

Quartus Version : Intel Tiber Developer Cloud for onaAPI - standar

Possible if you could provide your design file for us to better debug on the issue ?

Example "loop_coalesce":

//==============================================================
// Copyright Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================
#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>
#include <iomanip>
#include <iostream>

#include "exception_handler.hpp"

using namespace sycl;

// Matrix dimensions
constexpr size_t kNumRows = 4;
constexpr size_t kNumCols = 4;
constexpr size_t kNumElements = kNumRows * kNumCols;

// Forward declare the kernel name in the global scope.
// This FPGA best practice reduces name mangling in the optimization reports.
template <int N> class KernelCompute;

// The kernel implements a matrix multiplication.
// This is not meant to be a high performance implementation on FPGA!
// It's just a simple kernel with nested loops to illustrate loop coalescing.
template <int coalesce_factor>
void MatrixMultiply(const std::vector<float> &matrix_a,
const std::vector<float> &matrix_b,
std::vector<float> &res) {
#if FPGA_SIMULATOR
auto selector = sycl::ext::intel::fpga_simulator_selector_v;
#elif FPGA_HARDWARE
auto selector = sycl::ext::intel::fpga_selector_v;
#else // #if FPGA_EMULATOR
auto selector = sycl::ext::intel::fpga_emulator_selector_v;
#endif

try {
auto prop_list = property_list{property::queue::enable_profiling()};

queue q(selector, fpga_tools::exception_handler, prop_list);

auto device = q.get_device();

std::cout << "Running on device: "
<< device.get_info<sycl::info::device::name>().c_str()
<< std::endl;

buffer buffer_in_a(matrix_a);
buffer buffer_in_b(matrix_b);
buffer buffer_out(res);

event e = q.submit([&](handler &h) {
accessor accessor_matrix_a(buffer_in_a, h, read_only);
accessor accessor_matrix_b(buffer_in_b, h, read_only);
accessor accessor_res(buffer_out, h, write_only, no_init);

// The kernel_args_restrict promises the compiler that this kernel's
// accessor arguments won't alias (i.e. non-overlapping memory regions).
h.single_task<class KernelCompute<coalesce_factor>>(
[=]() [[intel::kernel_args_restrict]] {
size_t idx = 0;
float a[kNumRows][kNumCols];
float b[kNumRows][kNumCols];
float tmp[kNumRows][kNumCols];

// The loop_coalesce instructs the compiler to attempt to "merge"
// coalesce_factor loop levels of this nested loop together.
// For example, a coalesce_factor of 2 turns this into a single loop.
[[intel::loop_coalesce(coalesce_factor)]]
for (size_t i = 0; i < kNumRows; ++i) {
for (size_t j = 0; j < kNumCols; ++j) {
a[i][j] = accessor_matrix_a[idx];
b[i][j] = accessor_matrix_b[idx];
tmp[i][j] = 0.0;
idx++;
}
}

// Applying loop_coalesce to the outermost loop of a deeply nested
// loop results coalescing from the outside in.
// For example, a coalesce_factor of 2 coalesces the "i" and "j" loops,
// making a doubly nested loop.
[[intel::loop_coalesce(coalesce_factor)]]
for (size_t i = 0; i < kNumRows; ++i) {
for (size_t j = 0; j < kNumCols; ++j) {
float sum = 0.0f;
for (size_t k = 0; k < kNumCols; ++k) {
sum += a[i][k] * b[k][j];
}
tmp[i][j] = sum;
}
}

idx = 0;
[[intel::loop_coalesce(coalesce_factor)]]
for (size_t i = 0; i < kNumRows; ++i) {
for (size_t j = 0; j < kNumCols; ++j) {
accessor_res[idx] = tmp[i][j];
idx++;
}
}

});
});

} catch (exception const &exc) {
std::cerr << "Caught synchronous SYCL exception:\n" << exc.what() << '\n';
if (exc.code().value() == CL_DEVICE_NOT_FOUND) {
std::cerr << "If you are targeting an FPGA, please ensure that your "
"system has a correctly configured FPGA board.\n";
std::cerr << "Run sys_check in the oneAPI root directory to verify.\n";
std::cerr << "If you are targeting the FPGA emulator, compile with "
"-DFPGA_EMULATOR.\n";
}
std::terminate();
}
}

int main() {
std::vector<float> matrix_a(kNumElements);
std::vector<float> matrix_b(kNumElements);
std::vector<float> matrix_output_no_col(kNumElements);
std::vector<float> matrix_output(kNumElements);

// Specify the matrices to be multiplied
for (size_t i = 0; i < kNumRows; i++) {
size_t pos = i * kNumCols;
// Initialize A as identity matrix
matrix_a[i + pos] = 1.0;
for (size_t j = 0; j < kNumCols; j++) {
matrix_b[pos + j] = i * j + 1;
}
}

// Two versions of the simple matrix multiply kernel will be enqueued:
// - with coalesce_factor=1 (i.e. no loop coalescing)
// - with coalesce_factor=2 (coalesce two nested levels)
MatrixMultiply<1>(matrix_a, matrix_b, matrix_output_no_col);
MatrixMultiply<2>(matrix_a, matrix_b, matrix_output);

// Correctness check
bool passed = true;
for (size_t i = 0; i < kNumRows; i++) {
size_t pos = i * kNumCols;
for (size_t j = 0; j < kNumCols; j++) {
float val_noCol = matrix_output_no_col[pos + j];
float val = matrix_output[pos + j];
if (val_noCol != i * j + 1 || val != i * j + 1) {
std::cout << "FAILED: The results are incorrect\n";
passed = false;
}
}
}

if (passed) {
std::cout << "PASSED: The results are correct\n";
return 0;
} else {
std::cout << "FAILED\n";
return -1;
}
}

---------------------------------------------------------------------------------------------------------------------------

Step by step:

1.- qsub -I -l walltime=24:00:00 -l nodes=1:fpga_runtime:stratix10:ppn=2 -d .

2.- cd oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/loop_coalesce/build/

3.- make fpga_emu

4.- ./loop_coalesce.fpga_emu
Running on device: Intel(R) FPGA Emulation Device
Running on device: Intel(R) FPGA Emulation Device
PASSED: The results are correct

--------------------------------------------------------------------------------------------------------

Then when I try to compile and run in the FPGA stratix 10 with the next steps...

1.1.- make fpga

llvm-foreach:
icpx: error: fpga compiler command failed with exit code 1 (use -v to see invocation)
CMakeFiles/fpga.dir/build.make:94: recipe for target 'loop_coalesce.fpga' failed
make[3]: *** [loop_coalesce.fpga] Error 1
CMakeFiles/Makefile2:131: recipe for target 'CMakeFiles/fpga.dir/all' failed
make[2]: *** [CMakeFiles/fpga.dir/all] Error 2
CMakeFiles/Makefile2:138: recipe for target 'CMakeFiles/fpga.dir/rule' failed
make[1]: *** [CMakeFiles/fpga.dir/rule] Error 2
Makefile:144: recipe for target 'fpga' failed
make: *** [fpga] Error 2

My problem ocurred when I want to used a node to FPGA, Can you help me please???

Thanks...

BoonBengT_Altera · ‎09-26-2024

Hi @ruma,

Noted with thanks on the details provided.

As mention you are using the devcloud environment to run the mention design.

Just to check does the simulation compilation completed successfully as well?

And we would like to replicate the same to narrow down the issues, could you provided the URL of references design that you are referring to? That would be a great help.

Best Wishes

BB

ruma · ‎09-26-2024

Hello BoonBeng

Yes, the simulation compilation was completed successfully.

https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/Features/loop_coalesce

This is the URL of referense.

Thanks. Regards.-

Ruma

BoonBengT_Altera · ‎10-01-2024

Hi @ruma,

Noted on the URL of reference and simulation success.

Please do give us some time to try the same and will get back to you once we have any updates.

Thank you for the patients

Best Wishes

BB

BoonBengT_Altera · ‎10-08-2024

Hi @ruma,

Thanks for patients, I would suspect that the failure of the full compilation is due to incorrect device set for the compilation.

If you are using the default cmake, it would be setting to Agilex devices instead and that would cause a failure in the quartus compilation.

Hence for that, would suggest to compile with the following cmake to changed the target devices and perform full compilation again:

- cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10

If it fails again, would suggest to check on the quartus compilation logs in the following locations as below:

- ../loop_coalesce.fpga.prj/quartus_sh_compile.log

Hope that clarify.

Best Wishes

BB

BoonBengT_Altera · ‎10-10-2024

Hi @ruma,

Greetings, just checking in to see if there is any further doubts in regards to this matter.

Hope your doubts have been clarified.

Best Wishes

BB

BoonBengT_Altera · ‎10-13-2024

Hi @ruma,

Greetings, as we do not receive any further clarification/updates on the matter, hence would assume challenge are overcome. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.

Best Wishes

BB