- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The two screenshots says it all. The old screenshot is generated with aoc 21.2.0. Note how it coalesces the 16 float reads into one 512 bit DDR read. The new screenshot is generated with aoc 2024.2.1. It does not coalesce the 16 float reads and instead creates 16 individual read ports. Afaict, that is quite bad for performance and it wastes a lot of hardware resources.
Is there a way to make aoc 2024.2.1 coalesce, exactly like the old compiler did?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The OpenCL SDK for Intel FPGAs is no longer distributed since 22.4.
Therefore, 22.4 is the last version of the compiler to officially support OpenCL as an input language.
How did you get a 2024.2.1 version of aoc? I'm guessing that you got that binary from a oneAPI SYCL compiler for FPGA install.
The SYCL compiler uses aoc internally, but is not expected to work as a standalone.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Björne2,
Good day, just following up on the previous clarification.
By any chances did you managed to look into it?
Hope to hear from you soon.
Best Wishes
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Björne2,
Greetings, just checking in to see if there is any further doubts in regards to this matter.
Hope your doubts have been clarified.
Best Wishes
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Björne2,
Greetings, as we do not receive any further clarification/updates on the matter, hence would assume challenge are overcome. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.
Best Wishes
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi BoonBengT and yuguen. I know that OneAPI only support SYCL. But SYCL is horrible for FPGA work so I'm sticking with OpenCL. Besides, SYCL is just a thin layer on top of OpenCL anyway. I solved my issue by removing the "volatile" keyword. Apparently, in recent versions volatile prevents memory coalescing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is no guarantee that anything coming out of aoc will be functional as this tool is now deprecated.
What are your complaints about SYCL compared to OpenCL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know that, but the replacement SYCL tools are that bad. The main problem is that icpx embeds the FPGA image into the host code so you can't have one binary that switches between multiple images via command line parameters. Nor one binary with kernels for multiple different devices. icpx also takes 10 seconds for simple examples which compile instantly in OpenCL. It wouldn't be so bad if you could use a regular C++ compiler for the host code and just use icpx for the device code, but I haven't found any (easy) way of accomplishing that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Björne2 wrote:The main problem is that icpx embeds the FPGA image into the host code so you can't have one binary that switches between multiple images via command line parameters.
You can extract the aocx from the produced binary:
https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-1/extracting-the-fpga-hardware-configuration-aocx.html
@Björne2 wrote:Nor one binary with kernels for multiple different devices.
This is purpose of oneAPI (one API to target multiple devices) so I'm not sure what you are referring to here
@Björne2 wrote:icpx also takes 10 seconds for simple examples which compile instantly in OpenCL.
The legacy OpenCL compiler uses the same internal compiler as the SYCL compiler. So it should not be much slower.
I have two example designs:
- In SYCL: "time icpx -fsycl -fsycl-link=early [...]"
real 0m9.221s
user 0m7.064s
sys 0m0.814s
- In OpenCL: "time aoc -rtl [...]"
real 0m7.636s
user 0m3.490s
sys 0m0.724s
There is a difference, but not as big as 10 seconds to instant.
@Björne2 wrote:It wouldn't be so bad if you could use a regular C++ compiler for the host code and just use icpx for the device code, but I haven't found any (easy) way of accomplishing that.
SYCL is a superset of C++, so the host code written in SYCL is basically C++.
You can have a look at the SYCL code samples: https://github.com/oneapi-src/oneAPI-samples/tree/development/DirectProgramming/C%2B%2BSYCL_FPGA
In particular, you can have a look at the GettingStarted/fpga_compile sample to see a step by step C++ to SYCL example.
The SYCL example is 90% C++: https://github.com/oneapi-src/oneAPI-samples/blob/development/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/GettingStarted/fpga_compile/part2_dpcpp_functor_usm/src/vector_add.cpp
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page