- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
the official Intel fpga requirement page says the Cyclone10gx fpga is supported by oneAPI so I downloaded the latest version on my Ubuntu20 (Quartus Prime also installed), I tried to compile a sample-adder, the compiler (targeting the fpga) works but then when I run simple-add-buffer.fpga I get:
tetto@ubuntuoffice:~/simple-add/build$ ./simple-add-buffers.fpga
An exception is caught while computing on device.
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): No device of requested type available. Please check https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-system-requirements.html -1 (PI_ERROR_DEVICE_NOT_FOUND)
Aborted (core dumped)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Stefano,
When you compile using an FPGA family (such as "Cyclone10GX"), the compiler enters an HLS flow: it only generates an IP that you need to manually integrate into your own RTL pipeline.
The fpga binary that you obtained is not executable: it is only produced for you to inspect the performance of the IP after quartus compiled it (fmax, resource usage, etc.)
Here is a code sample demonstrating how one can integrate an HLS IP into an RTL pipeline to be able to run such IP on an FPGA: https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/Tools/platform_designer
You were expecting the compiler to produce a binary that could be directly executed on the FPGA: to do so, the compiler needs to understand the interface between the IP and your FPGA. This is what we call the "BSP".
Some FPGA board vendors do provide BSPs with their FPGA boards, which would have allowed you to compile your program using "icpx ... -Xstarget=<path to your BSP> ..." rather than "-Xstarget=Cyclone10GX".
In that case, and in that case only, the FPGA binary produced could have been run on the FPGA natively.
You can have a look at this documentation page to better understand the difference between the "FPGA acceleration flow" and the "HLS flow": https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-0/intel-oneapi-fpga-development-flow.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@yuguen that really clarified everything, thanks!
In the document you pointed:
something is unclear to me, you create a project in Quartus but it seems you do not create a Top-level entity, since when I do "Start Analysis and Elaboration" I get an error about that, I wonder if a top-level entity should be created anyway and what should look like. In one screenshot from the previous github I see that the top level entity is named "add" when you create the project, but then nothing is mentioned about it (what should just contain?)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey @StefanoC
This sample demonstrates how to integrate a generated IP into an existing RTL pipeline.
In this case, the existing RTL files are located in https://github.com/oneapi-src/oneAPI-samples/blob/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/Tools/platform_designer/add-quartus-sln
The "add" top level entity is found in the add.sv file.
"Step 2." tells you to copy this "add.sv" file into your Quartus project folder:
cp add-quartus-sln/add.sv add-quartus
Then, to add it to your Quartus project (step 2.v):
Following these steps (with all the other steps), Quartus should be able to understand that the "add" top level module is in this file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @yuguen for clarifying the content of add.sv
However I had followed the instructions and after I issue "make report" the following files are generated (there is no add.sv but rather add_report_di.sv which internally has a different module name and subsequently Quartus complains about missing top node. Why do I get add_report_di.sv rather than add.sv?
-rw-rw-r-- 1 tetto tetto 527 Feb 19 17:08 sys_description.txt
-rw-rw-r-- 1 tetto tetto 3831 Feb 19 17:08 sys_description.legend.txt
-rw-rw-r-- 1 tetto tetto 1860 Feb 19 17:08 sys_description.hex
-rw-rw-r-- 1 tetto tetto 105 Feb 19 17:08 opencl.ipx
-rw-rw-r-- 1 tetto tetto 4758 Feb 19 17:08 kernel_system.v
-rw-rw-r-- 1 tetto tetto 3388 Feb 19 17:08 kernel_system.tcl
-rw-rw-r-- 1 tetto tetto 7882 Feb 19 17:08 kernel_system.qip
-rw-rw-r-- 1 tetto tetto 31 Feb 19 17:08 kernel_system_import.tcl
-rw-rw-r-- 1 tetto tetto 70 Feb 19 17:08 kernel_report.tcl
-rw-rw-r-- 1 tetto tetto 1197 Feb 19 17:08 ipinterfaces.xml
-rw-rw-r-- 1 tetto tetto 72 Feb 19 17:08 ip_include.tcl
-rw-rw-r-- 1 tetto tetto 28 Feb 19 17:08 compiler_metrics.out
-rw-rw-r-- 1 tetto tetto 1197 Feb 19 17:08 board_spec.xml
-rw-rw-r-- 1 tetto tetto 0 Feb 19 17:08 add_report.v
-rw-rw-r-- 1 tetto tetto 3759 Feb 19 17:08 add_report_sys.v
-rw-rw-r-- 1 tetto tetto 18924 Feb 19 17:08 add_report_sys_hw.tcl
-rw-rw-r-- 1 tetto tetto 231 Feb 19 17:08 add_report.log
-rw-rw-r-- 1 tetto tetto 19180 Feb 19 17:08 add_report_di.sv
-rw-rw-r-- 1 tetto tetto 1307 Feb 19 17:08 add_report_di_inst.v
-rw-rw-r-- 1 tetto tetto 17892 Feb 19 17:08 add_report_di_hw.tcl
-rw-rw-r-- 1 tetto tetto 8393 Feb 19 17:08 add_report.bc.xml
drwxrwxr-x 2 tetto tetto 4096 Feb 19 17:08 ip
drwxrwxr-x 3 tetto tetto 4096 Feb 19 17:08 reports
drwxrwxr-x 3 tetto tetto 4096 Feb 19 17:08 kernel_hdl
drwxrwxr-x 3 tetto tetto 4096 Feb 19 17:08 include
drwxrwxr-x 3 tetto tetto 4096 Feb 19 17:08 linux64
Also in the instruction it is sad to copy these:
cp add-quartus-sln/add.sv add-quartus $> cp add-quartus-sln/jtag.sdc add-quartus
but after "make report" the folder add-quartus-sln is not there, so also miss jtag.sdc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey @StefanoC,
When doing "make report", you are generating RTL for the SYCL code that is in the add-oneapi folder.
This RTL top module can indeed be found in "add_report_di.sv".
This is the IP that you need to integrate into an existing RTL pipeline.
The "add_quartus_sln" folder already contains RTL, and is there to mimic your own RTL pipeline. So the "add.sv" file is already there, before you do "make report" as this is not a generated file, this is the existing RTL pipeline. You can peak into this file and see it is making a led turn on on the FPGA based on another signal. This is not possible to express using SYCL.
This tutorial shows how to connect the generated RTL from SYCL (the add_report_di.sv IP) with the existing "add.sv" RTL pipeline.
So "add" from add.sv is the top level module, that depends on the SYCL generated RTL.
The steps in the README tells you to:
1/ generate the SYCL IP
Create a Quartus project with the existing RTL files:
This also sets the top level module to "add" which is contained in the add.sv that was just copied from add-quartus-sln
Then, import the SYCL generated IP:
Then connects the two in the following steps, etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok @yuguen now I better understand the workflow, and I could make progress thanks to your explanation.
Unfortunately I am stuck at the last step of the tutorial as I am on the Cyclone10 (while the tutorial is for Arria) so when setting the pins I really don't know which ones to select (PIN_AM10 does not exist on the Cyclone10). Also I feel I would need the a file jtag.sdc made for Cyclone10 (I cannot seem to find it here: https://github.com/altera-opensource/ghrd-socfpga)
On a slightly different note, I tried to modify the C++ source file adding arrays to be added (rather than primitive int), in my new add_kernel_wrapper I observe I got an "Avalon Memory Mapped Host" while previously (like in the tutorial) I only had "Avalon Memory Mapping Agent"); I wonder how the new "Avalon Memory Mapped Host" should be connected, as generating the HDL I see warning about "Avalon Memory Mapped Host" must be connected to an Avalon-MM agent or exported.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Stefano,
Can you please share what board you are using?
I will assume you are using the Terasic Cyclone 10 GX devkit. There are some sample designs here: https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=253&No=1147&PartNo=3#contents
According to the user guide in there, there are 2 100MHz clocks you can use, C10_CLKUSR and C10_REFCLK2:
As far as the JTAG.sdc, I can't find one to use for that devkit, but it is not strictly necessary for functionality. The timing analyzer will complain about timing failure since it will try to constrain the JTAG lines but I think you can ignore those warnings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @whitepau
yes I am using the Cyclone 10GX, the link you provided contains various things but I do not see any oneAPI example. So far I think I am near to having a working sample using the "add" tutorial (https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/Tools/platform_designer/add-oneapi/src based on the Arria board), I will just provide the pin clock you suggested and bypass the jtag.sdc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if you use arrays as your kernel arguments, the compiler will map them to an mm host interface, that reaches out to a mm agent interface. for more info see this code sample: https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/Features/ip_authoring_interfaces/component_interfaces_comparison
You need to add some kind of memory for the host to connect to. You are getting into system design that is beyond the scope of oneAPI SYCL HLS /IP creation
There is a sample here that shows a simple system with an on-chip memory:
https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/ReferenceDesigns/niosv
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@whitepau my source code is exactly as the "Naive" solution:
I can compile it with oneAPI and produce IPs; I "just" miss how to how to deal with the avalon thing, as the link below fairly well describes what to do for primitive data type only: https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/Tools/platform_designer
is there an equivalent platform design, module connection for the vectorial add ("naive implementation")?
StefanoC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is there an equivalent platform design, module connection for the vectorial add ("naive implementation")?
No, as I mentioned, the naive solution in that link requires some memory-mapped agent to read the input vectors from and write the output vectors to. You can add an on-chip memory IP to the Platform Designer sample. The IP will have one or more Avalon Agent (slave) interfaces. You need to connect the host (master) interface from your vector_add IP to this agent, and then connect the master interface from the jtag avalon IP to the on-chip memory agent as well; you can use the JTAG interface to fill the on-chip memory with data and then configure the vector_add ip to access it. This is of course a lot of JTAG commands!
An easier solution is the design in the Nios® V softcore processor sample that I shared:
https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/ReferenceDesigns/niosv
This uses a soft CPU in place of all the JTAG commands i mentioned, so it fills the on-chip memory with some data and configures and starts the oneAPI IP. In the niosv sample, the oneAPI IP does a simple memory copy rather than a vector add.
The JTAG UART IP in this design is similar to the JTAG Avalon Master IP in the Platform Designer sample. It allows the Nios soft processor to be controlled through a JTAG interface.
of course, you don't have to use on-chip memory; if you want you can use an EMIF IP to connect to the DRAM chips on your Cyclone 10 GX board (but I have no experience using that so someone else will need to help you with that :))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@whitepau I think the niosV will fit my case, I will give a try today, one last clarification, as that userguide mention "..demonstrates how to simulate an FPGA IP produced with the Intel® oneAPI DPC++/C++" and scrolling down I see that towards the end it actually "Generate Testbench System". Will I be able to synthetize on the fpga rather than just simulate?
thanks!
StefanoC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yeah that project has no board-specific settings.
like I said: watch out for the on-chip block RAM getting too big.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@whitepau I will check it out, assuming simulation will work, I could them synthetize on my board, right?
StefanoC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you set the pins and the BRAM size, it should work on a board. You will need to use the nios tools to connect to the niosv and monitor it over the JTAG bus. There are guides for that though
https://cdrdv2-public.intel.com/784469/an-784468-784469.pdf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@whitepauthe simulation does work. I will edit the sample dma to actually do something useful rather than copying/comparing elements. You mentioned I will have to use the nios tools to connect to the soft core; however I was planning on running my code on the computer CPU with some part of my algorithm memory mapped to an IP that will result from oneAPI compiler. So in this case, if I understand correctly, I don't need to connect to the niosV (unless I want to inspect/debug), am I right?
thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @whitepau I am still running the plain niosv sample simple dma (as it is in the repository without any modification), trying to run in on the hardware and have the cpu (C) interact with the fpga (I confirm the simulation, as detailed in the github repository works).
I added this top node entity:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity test_system is
port (
clk : in STD_LOGIC;
rst : in STD_LOGIC
);
end entity test_system;
architecture Behavioral of test_system is
component pd_system is
port (
clk_clk : in std_logic := 'X';
reset_reset : in std_logic := 'X';
simple_dma_accelerator_device_exception_bus_data : out std_logic_vector(63 downto 0)
);
end component pd_system;
signal pd_system_clk : std_logic;
signal pd_system_rst : std_logic;
begin
u0 : pd_system
port map (
clk_clk => pd_system_clk,
reset_reset => pd_system_rst,
simple_dma_accelerator_device_exception_bus_data => open
);
pd_system_clk <= clk;
pd_system_rst <= rst;
end architecture Behavioral;
I synthesized the project onto my board, I got a few warnings:
1) Critical Warning(12677): No exact pin location assignment(s) for 1 pins of 2 total pins. For the list of pins please refer to the I/O Assignment Warnings table in the fitter report
2) No user constrained base clocks found in the design. Calling "derive_clocks -period 1.0"
3)Timing requirements not met
clk -6.334 -18110.791 7371 Slow 900mV 100C Model 1
altera_reserved_tck -1.788 -410.899 380 Slow 900mV 100C Model 2
About 1) I think it's referring to the clock/reset signals, I tried to put clock location in the pin planner as you suggested "C10_CLKUSR" but that value it's not accepted, scrolling the dropdown menu I selected "PIN_C10 I/O Bank 2k"; I haven't yet assigned the reset that's why one location is not assigned.
I then used the USB blaster JTAG to program the board (successfully).
then I compiled /kernels/simple_dma/ using "make fpga" rather than "make report" and I got an executable, but when I run it I got this:
tetto@ubuntuoffice:~/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/niosv/kernels/simple_dma/build$ ./simple_dma.fpga
Running on device: SimulatorDevice : Multi-process Simulator (aclmsim0)
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Invalid device program image: size is zero -30 (PI_ERROR_INVALID_VALUE)
Aborted (core dumped)
tetto@ubuntuoffice:~/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/niosv/kernels/simple_dma/build$ sudo env "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" ./simple_dma.fpga
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): No device of requested type available. Please check https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-system-requirements.html -1 (PI_ERROR_DEVICE_NOT_FOUND)
Aborted
I tried with sudo as I suspected it couldn't find the board; I exported a variable because without it could complain about a missing library, however it screams about a runtime error.
Am I missing something to run this sample on the board and have the C interact with it?
thanks!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I successfully synthesized the niosV example (as it is), adding this top node:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity test_system is
port (
clk : in STD_LOGIC;
rst : in STD_LOGIC
);
end entity test_system;
architecture Behavioral of test_system is
component pd_system is
port (
clk_clk : in std_logic := 'X';
reset_reset : in std_logic := 'X';
simple_dma_accelerator_device_exception_bus_data : out std_logic_vector(63 downto 0)
);
end component pd_system;
signal pd_system_clk : std_logic;
signal pd_system_rst : std_logic;
begin
u0 : pd_system
port map (
clk_clk => pd_system_clk,
reset_reset => pd_system_rst,
simple_dma_accelerator_device_exception_bus_data => open
);
pd_system_clk <= clk;
pd_system_rst <= rst;
end architecture Behavioral;
However when I tried (a few things) to run the software companion produced by oneAPI I got:
tetto@ubuntuoffice:~/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/niosv/kernels/simple_dma/build$ ./simple_dma.fpga
Running on device: SimulatorDevice : Multi-process Simulator (aclmsim0)
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
tetto@ubuntuoffice:~/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/niosv/kernels/simple_dma/build$
tetto@ubuntuoffice:~/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/niosv/kernels/simple_dma/build$ sudo ./simple_dma.fpga
[sudo] password for tetto:
./simple_dma.fpga: error while loading shared libraries: libdspba_mpir.so.23: cannot open shared object file: No such file or directory
tetto@ubuntuoffice:~/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/niosv/kernels/simple_dma/build$ sudo LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./simple_dma.fpga
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): No device of requested type available. Please check https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-system-requirements.html -1 (PI_ERROR_DEVICE_NOT_FOUND)
Aborted
tetto@ubuntuoffice:~/oneAPI-samples/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/niosv/kernels/simple_dma/build$
Coming back to this example, I was also playing with:
oneAPI seems to produce a quartus project completed with the top node (that I can compile and synthesize), but again when I run the executable I encounter a runtime error. In that vector_add.src it seems this "unified memory" would just do the job, but I recall you mentioned earlier one has to edit the RTL design (although in this part3 the top node seems properly generated and needed IPs/avalon devices are there and instantiated).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stefano, let me refer you once again to this comment: https://community.intel.com/t5/Intel-High-Level-Design/oneAPI-on-Cyclone10gx/m-p/1573813/highlight/true#M3497
The code you got from the code sample is the dark green boxes in this picture (Host code and Application Kernel). without all the other stuff in the middle, that generated executable file will not work.
The executable that oneAPI emits will not work without a supported BSP. Since you have selected -Xstarget=Cyclon10GX, you have created an IP, which is just the Application Kernel in that picture. This means that when you use the Application Kernel, it is treated like any other IP that was written using Verilog or VHDL.
To 'run' your IP, you will need to program the Nios-based design you created onto your board and execute Nios code to control the IP.
oneAPI seems to produce a quartus project completed with the top node (that I can compile and synthesize), but again when I run the executable I encounter a runtime error. In that vector_add.src it seems this "unified memory" would just do the job, but I recall you mentioned earlier one has to edit the RTL design (although in this part3 the top node seems properly generated and needed IPs/avalon devices are there and instantiated).
That generated Intel® Quartus® Prime project is only for estimating fMAX of your IP. The point of it is to assign the pins of the IP to virtual pins so that Intel® Quartus® Prime will place and route it without actually connecting it to physical pins. Here is an explanation of virtual pin assignments https://www.youtube.com/watch?v=QET0lC-jdAQ
If you wish to build a custom BSP so that you can install your C10GX PCIe card into a computer and communicate with it through the OpenCL runtime (and have oneAPI host code control it), we have guidance on custom BSP creation here:
Getting started : https://ofs.github.io/latest/hw/common/user_guides/oneapi_asp/ug_oneapi_asp/
Reference Manual: https://ofs.github.io/latest/hw/common/reference_manual/oneapi_asp/oneapi_asp_ref_mnl/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Stefano,
Following through the previous clarification to see if there are any further doubts in regards to this matter.
Hope your doubts have been clarified.
Best Wishes
BB
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page