Re:Stable argument doesn't work in simulation

DorianL · ‎06-10-2024

Hi everyone,

I have an issue when I try to run my oneapi kernel by passing my arguments with "stable annotated_arg". I try to use a "for" loop with those "stable" arguments as variable in simulation but it is very slow and doesn't work very well whereas when i use classic "int" declared in the kernel without using an argument variable I don't have this issue in the "for" loop and the simulation work fine and fast. Do you have an idea of what could be the issue ? Thank you !

DorianL

aikeu · ‎06-12-2024

Hi DorianL,

Can try to check out the optimization reference from the guide below:

https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/2023-1/loops.html

Thanks.

Regards,

Aik Eu

DorianL · ‎06-14-2024

Hi @aikeu,

Thank you for your reply, I already checked the documentation and I think the issue is that my inner loop is dynamic because it is an input of my IP (the number of column of the picture to process). I tried to solve the issue by bounding the loop with a random number like it is made in the example of the documentation, but my kernel doesn't accept the rand() function because it's an external SYCL function. Do you have an idea of how I could make my kernel work ?

Thank you !

DorianL

BoonBengT_Intel · ‎06-17-2024

Hi @DorianL,

Greetings, just to further understand the issues you mention that you have a inner loop issues.

Would you be able to share the code example that you have for the kernels and also what are the error/issues that you are seeing? And is it correct to assume that the error happens on the compilation?

If you can provide the compilation command it would also be very helpful.

That would better help us to understand the situation.

Thanks,

Regards

BB

DorianL · ‎06-19-2024

Hi @BoonBengT_Intel,

Thank your for your answer, here is my code with my kernel. My issue is about the inner loop (line 49) in the kernel that process pixels should have a II of 1 wich is not the case in simulation. I have rdy signals that are not continually high whereas it is indicated in the report that the II should be 1. The issue is in my opinion about the dynamic trip count of this loop because the stop condition of my "for" loop is an input of the kernel and I think that the compiler is considering it as a dynamic input. How can I change that to be a fixed input ?

Here is the code of my kernel :

template <typename flux_in, typename flux_tempo>
struct travail_sur_voisinage {

    sycl::ext::oneapi::experimental::annotated_arg<
       int , decltype(sycl::ext::oneapi::experimental::properties{
               stable})>
       taille_h;

     sycl::ext::oneapi::experimental::annotated_arg<
       int , decltype(sycl::ext::oneapi::experimental::properties{
               stable})>
       taille_v;
  
    auto get(sycl::ext::oneapi::experimental::properties_tag) {
        return sycl::ext::oneapi::experimental::properties{
            
                streaming_interface<>};
  }

  void operator()() const {
    //Compteurs ligne pixel

	//Entree Sortie
  [[intel::fpga_register]]
	unsigned int pixel_a_traiter;

  [[intel::fpga_register]]
	unsigned int pixel_a_envoyer;

  
 
  [[intel::fpga_register]]
	unsigned int pixel_apres_traitement;

	//Ligne a retard
  [[intel::fpga_memory("BLOCK_RAM")]]
	unsigned int line_buffer[5][NB_COLONNE_MAX];

	//Voisinnage
  [[intel::fpga_register]]
	unsigned int fenetre[5][5];

    [[intel::initiation_interval(1)]]
    for (int num_lig = 0; num_lig < taille_v + 2; num_lig++)
    {
      
      [[intel::initiation_interval(1)]]
      //[[intel::speculated_iterations(0)]]
    	for (int num_col = 0; (num_col < taille_h + 2); num_col++)
    	{

        if (num_lig < taille_v && num_col < taille_h)
        {

          pixel_a_traiter = flux_in::read();

          //Gestion ligne a retard
  
          fpga_tools::UnrolledLoop<0,4>([&](auto l)
          {
            line_buffer[l][num_col] = line_buffer[l + 1][num_col];
          });
          line_buffer[4][num_col] = pixel_a_traiter;
          
          //Fin gestion ligne a retard

          //Fenetre video glissante

          fpga_tools::UnrolledLoop<0,5>([&](auto li)
          {
              // #pragma unroll

            fpga_tools::UnrolledLoop<0,4>([&](auto co)
            {
              fenetre[li][co] = fenetre[li][co + 1];
            });
            fenetre[li][4] = line_buffer[li][num_col]; 
          });
          //Fin Fenetre video glissante
        }
      pixel_apres_traitement = traitement_5x5(fenetre);
    


      if ((num_lig >= 2) && (num_col >= 2))
      {
        pixel_a_envoyer = 0;
        
        if (((num_lig >= 4) && (num_lig < taille_v) && (num_col >= 4) && (num_col < taille_h)))
        {
          pixel_a_envoyer = pixel_apres_traitement;
        }
        flux_tempo::write(pixel_a_envoyer);
      }
       
  }
		}
	}
      
};

the compilation command is :

tp3_video.fpga_sim: kernel_sim.o
icpx -fsycl -fintelfpga -Xsclock=400MHz -Xsoptimize=latency  -Xssimulation -Xsghdl=0 -Xstarget=Agilex7 -Xsv $^ -o $@ 


kernel_sim.o: src/tp3_video.cpp
	icpx -fsycl  -O3 -g -std=c++17 -Wall -I include -v -fintelfpga -Xsprofile -Xssimulation -DFPGA_SIMULATOR -o $@ -c src/tp3_video.cpp

Have a great day,

DorianL

aikeu · ‎06-18-2024

Hi DorianL,

I not sure it is directly related to your application request or not based on your previous feedback but can check out the method below on getting random number:

https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Quick-random-number-on-SYCL-device/td-p/1262724

Thanks.

Regards,

Aik Eu

DorianL · ‎06-19-2024

Hi @aikeu,

Thank you for your help, I'll check that now !

DorianL

aikeu · ‎06-23-2024

Hi DorianL,

I will close the thread if no further follow up question.

Thanks.

Regards,

Aik Eu

aikeu · ‎06-25-2024

Hi DorianL,

As we do not receive any response from you on the previous question/reply/answer that we have provided. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.

Thanks.

Regards,

Aik Eu

DorianL · ‎06-26-2024

Hi @aikeu ,

I was waiting for an answer following my last post with my code and compilation commands as asked by @BoonBengT_Intel in a previous post. Could we wait an answer from him before closing the thread ? Thank you !

DorianL

BoonBengT_Intel · ‎07-01-2024

Hi @DorianL,

Appreciate the hold, based on the explanation above on the first part when you are facing challenges in working with the loop's optimization, the Loop Analysis Report provides a good insight on the loop structure that you have written and would give some ideas on where the bottleneck is, more details on the report can be found in the following link below:
- https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-1/loop-analysis.html

On the other hand, when you are trying the inner loop optimization and facing another issues when trying to implement the external function, you can use the 'std::default_random_engine' instead in c++.
More details of the implementation of inner loop optimization could be found in our git repo below:
- https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/C%2B%2BSYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop

Hope that clarifies
Regards
BB

Stable argument doesn't work in simulation

oneAPI (SW Development| BSP | IP Integration)