- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I have an issue when I try to run my oneapi kernel by passing my arguments with "stable annotated_arg". I try to use a "for" loop with those "stable" arguments as variable in simulation but it is very slow and doesn't work very well whereas when i use classic "int" declared in the kernel without using an argument variable I don't have this issue in the "for" loop and the simulation work fine and fast. Do you have an idea of what could be the issue ? Thank you !
DorianL
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi DorianL,
Can try to check out the optimization reference from the guide below:
https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/2023-1/loops.html
Thanks.
Regards,
Aik Eu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @aikeu,
Thank you for your reply, I already checked the documentation and I think the issue is that my inner loop is dynamic because it is an input of my IP (the number of column of the picture to process). I tried to solve the issue by bounding the loop with a random number like it is made in the example of the documentation, but my kernel doesn't accept the rand() function because it's an external SYCL function. Do you have an idea of how I could make my kernel work ?
Thank you !
DorianL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DorianL,
Greetings, just to further understand the issues you mention that you have a inner loop issues.
Would you be able to share the code example that you have for the kernels and also what are the error/issues that you are seeing? And is it correct to assume that the error happens on the compilation?
If you can provide the compilation command it would also be very helpful.
That would better help us to understand the situation.
Thanks,
Regards
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @BoonBengT_Intel,
template <typename flux_in, typename flux_tempo>
struct travail_sur_voisinage {
sycl::ext::oneapi::experimental::annotated_arg<
int , decltype(sycl::ext::oneapi::experimental::properties{
stable})>
taille_h;
sycl::ext::oneapi::experimental::annotated_arg<
int , decltype(sycl::ext::oneapi::experimental::properties{
stable})>
taille_v;
auto get(sycl::ext::oneapi::experimental::properties_tag) {
return sycl::ext::oneapi::experimental::properties{
streaming_interface<>};
}
void operator()() const {
//Compteurs ligne pixel
//Entree Sortie
[[intel::fpga_register]]
unsigned int pixel_a_traiter;
[[intel::fpga_register]]
unsigned int pixel_a_envoyer;
[[intel::fpga_register]]
unsigned int pixel_apres_traitement;
//Ligne a retard
[[intel::fpga_memory("BLOCK_RAM")]]
unsigned int line_buffer[5][NB_COLONNE_MAX];
//Voisinnage
[[intel::fpga_register]]
unsigned int fenetre[5][5];
[[intel::initiation_interval(1)]]
for (int num_lig = 0; num_lig < taille_v + 2; num_lig++)
{
[[intel::initiation_interval(1)]]
//[[intel::speculated_iterations(0)]]
for (int num_col = 0; (num_col < taille_h + 2); num_col++)
{
if (num_lig < taille_v && num_col < taille_h)
{
pixel_a_traiter = flux_in::read();
//Gestion ligne a retard
fpga_tools::UnrolledLoop<0,4>([&](auto l)
{
line_buffer[l][num_col] = line_buffer[l + 1][num_col];
});
line_buffer[4][num_col] = pixel_a_traiter;
//Fin gestion ligne a retard
//Fenetre video glissante
fpga_tools::UnrolledLoop<0,5>([&](auto li)
{
// #pragma unroll
fpga_tools::UnrolledLoop<0,4>([&](auto co)
{
fenetre[li][co] = fenetre[li][co + 1];
});
fenetre[li][4] = line_buffer[li][num_col];
});
//Fin Fenetre video glissante
}
pixel_apres_traitement = traitement_5x5(fenetre);
if ((num_lig >= 2) && (num_col >= 2))
{
pixel_a_envoyer = 0;
if (((num_lig >= 4) && (num_lig < taille_v) && (num_col >= 4) && (num_col < taille_h)))
{
pixel_a_envoyer = pixel_apres_traitement;
}
flux_tempo::write(pixel_a_envoyer);
}
}
}
}
};
the compilation command is :
tp3_video.fpga_sim: kernel_sim.o
icpx -fsycl -fintelfpga -Xsclock=400MHz -Xsoptimize=latency -Xssimulation -Xsghdl=0 -Xstarget=Agilex7 -Xsv $^ -o $@
kernel_sim.o: src/tp3_video.cpp
icpx -fsycl -O3 -g -std=c++17 -Wall -I include -v -fintelfpga -Xsprofile -Xssimulation -DFPGA_SIMULATOR -o $@ -c src/tp3_video.cpp
Have a great day,
DorianL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi DorianL,
I not sure it is directly related to your application request or not based on your previous feedback but can check out the method below on getting random number:
Thanks.
Regards,
Aik Eu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi DorianL,
As we do not receive any response from you on the previous question/reply/answer that we have provided. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.
Thanks.
Regards,
Aik Eu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @aikeu ,
I was waiting for an answer following my last post with my code and compilation commands as asked by @BoonBengT_Intel in a previous post. Could we wait an answer from him before closing the thread ? Thank you !
DorianL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DorianL,
Appreciate the hold, based on the explanation above on the first part when you are facing challenges in working with the loop's optimization, the Loop Analysis Report provides a good insight on the loop structure that you have written and would give some ideas on where the bottleneck is, more details on the report can be found in the following link below:
- https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-1/loop-analysis.html
On the other hand, when you are trying the inner loop optimization and facing another issues when trying to implement the external function, you can use the 'std::default_random_engine' instead in c++.
More details of the implementation of inner loop optimization could be found in our git repo below:
- https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop
Hope that clarifies
Regards
BB
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page