Use oneMKL with FPGA

jfra1397 · ‎03-07-2023

Hello everyone,

I'm currently working on optimizing my code using the BLAS Level 1 and Level 2 libraries of the Intel Math Kernel Library (MKL) on an FPGA in the DevCloud environment. However, I'm running into an issue where the compilation process is taking longer than the maximum walltime of 24 hours that I'm able to set.

Has anyone encountered a similar issue and found a solution to this problem? I would appreciate any suggestions or advice on how to resolve this.

By the way, I attempted to run the code on the FPGA Emulator, but encountered an error message at runtime stating that the MKL library is not implemented for the FPGA Emulator, but it does compile.

Thank you in advance for your help.

BoonBengT_Intel · ‎03-09-2023

Hi @jfra1397,

Thank you for posting in Intel community forum and hope all is well.

There seems to be 2 part of the question, for the wall time in devcloud default are 6 hours, unfortunately as mention the max permit are 24 hours. Hence would suggest further modification to optimize the project build are required. Perhaps build the minimal code required to eliminate the part which required long time.

As for the issues encounter during emulation can you provide more details on the error like logs or error message of the screenshot for us to further understand the issues, that would be helpful.

Hope that clarify.

Best Wishes

BB

BoonBengT_Intel · ‎03-12-2023

Hi @jfra1397,

Good day, just following up on the previous clarification.

By any chances did you managed to look into the it?

Best Wishes

BB

BoonBengT_Intel · ‎03-15-2023

Hi @jfra1397,

Greetings, as we do not receive any further clarification/updates on the matter, hence would assume challenge are overcome. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.

Best Wishes

BB

jfra1397 · ‎03-17-2023

Hey sorry for my late response

for the emulation I get the following message:

```oneapi::mkl::oneapi::mkl::blas::dgemv: unsupported device: Intel(R) FPGA Emulation Device```

A simple example for the FPGA Arria 10 does compile but gets a similar error:

```

terminate called after throwing an instance of 'oneapi::mkl::unsupported_device'

what(): oneapi::mkl::oneapi::mkl::blas::dgemv: unsupported device: pac_a10 : Intel PAC Platform (pac_ee00000)

/var/spool/torque/mom_priv/jobs/2254885.v-qsvr-1.aidevcloud.SC: line 16: 7687 Aborted ./main

```

here a simple example:

```

#include <iostream>

#include <sycl.hpp>

#include "oneapi/mkl.hpp"

#include <sycl/ext/intel/fpga_extensions.hpp>

int main (void)

{

size_t N = 100;

#if FPGA_EMULATOR

// Intel extension: FPGA emulator selector on systems without FPGA card.

::sycl::ext::intel::fpga_emulator_selector d_selector;

#elif FPGA

// Intel extension: FPGA selector on systems with FPGA card.

::sycl::ext::intel::fpga_selector d_selector;

#else

// The default device selector will select the most performant device.

auto d_selector{::sycl::default_selector_v};

#endif

::sycl::queue q(d_selector);

std::vector<double> A(N*N);

std::vector<double> x(N);

std::vector<double> y(N);

for(auto it = A.begin(); it != A.end(); it++) *it = (((double) std::rand() / (double) RAND_MAX) - 0.5) * 2.0;

for(auto it = x.begin(); it != x.end(); it++) *it = (((double) std::rand() / (double) RAND_MAX) - 0.5) * 2.0;

for(auto it = y.begin(); it != y.end(); it++) *it = 0;

for(auto it = y.begin(); it != y.end(); it++) std::cout<<*it;

std::cout<<std::endl;

{

::sycl::buffer<double> b_A(A.data(), N*N);

::sycl::buffer<double> b_x(x.data(), N);

::sycl::buffer<double> b_y(y.data(), N);

//Additional values

double alpha = 1.0;

double beta = 0.0;

oneapi::mkl::blas::column_major::gemv(q, oneapi::mkl::transpose::trans, N, N, alpha, b_A, N, b_x, 1, beta, b_y, 1);

q.wait_and_throw();

}

for(auto it = y.begin(); it != y.end(); it++) std::cout<<*it;

std::cout<<std::endl;

}

```

attached also the cmake file

BoonBengT_Intel · ‎03-28-2023

Hey @jfra1397,

Noted and appreciate the details of the errors.

Based emulation message that shown would suspect there is a code issues where error can be seen throwed in the main.

If you do not mind me asking where did you get the codes from, do you have a references design that you are referring to?

Hence would suggest the following recommendation to try on:

1) If you desire to run emulation or hardware compile, please ensure the node selected in devcloud are with correct hardware. You can do so by running the 'pbs_nodes' command.

2) Would suggest to look into the references design below as it will be a good start to compile oneAPI design for FPGA:

- https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/GettingStarted/fpga_compile

Hope that clarify.

Best Wishes

BB

BoonBengT_Intel · ‎04-02-2023

Hi @jfra1397,

Good day, just following up on the previous clarification.

By any chances did you managed to look into the it?

Best Wishes

BB

jfra1397 · ‎04-03-2023

Hey,

I am currently working on running a conjugate gradient solver on an FPGA I used the reference you posted as a template and succeeded implementing the CG. Unfortunately I am facing some challenges with the performance. After further investigation I have found that the vector addition is performing much slower on the FPGA compared to runs on the GPU or CPU. While the matrix multiplication and the dot product were relatively okay, the overall performance is quite disappointing.

As I am not very experienced in optimizing code, I considered using oneMKL for the CG implementation to see if I can enhance the performance. However, when using oneMKL, I get the 'device not supported' error I posted already.

BoonBengT_Intel · ‎04-03-2023

Hi @jfra1397,

Greetings and noted with thanks for the updates, from the explanation below understood that one of the challenges are optimizing the codes, for that we do have full scale guide available below:

- https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-0/overview.html

This guide consist various are to look at for oneAPI programs optimization by guiding us to understand the report and next steps to optimize them.

As for the second challenges faced on device not support, do you have the link to the oneMKL references design you are referring to? Based on the error, I would suspect that the nodes you are working on does not have the right hardware, hence refer to the references design on the required hardware and access the node with the required hardware might be able to solve that.

Hoep that clarify

Best Wishes

BB

BoonBengT_Intel · ‎04-04-2023

Hi @jfra1397,

Following up on the steps mention previously.

Please do let us know if there is further clarification/doubts we can help with.

Best Wishes

BB

BoonBengT_Intel · ‎04-07-2023

Hi @jfra1397,

Greetings, as we do not receive any further clarification/updates on the matter, hence would assume challenge are overcome. Please do get in touch with us if not after 15 days, this thread will be transitioned to community support. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.

Best Wishes

BB