Intel® High Level Design
Support for Intel® High Level Synthesis Compiler, DSP Builder, OneAPI for Intel® FPGAs, Intel® FPGA SDK for OpenCL™
677 Discussions

Stable argument doesn't work in simulation

DorianL
Novice
245 Views

Hi everyone,

 

I have an issue when I try to run my oneapi kernel by passing my arguments with "stable annotated_arg". I try to use a "for" loop with those "stable" arguments as variable in simulation but it is very slow and doesn't work very well  whereas when i use classic "int" declared in the kernel without using an argument variable I don't have this issue in the "for" loop and the simulation work fine and fast. Do you have an idea of what could be the issue ? Thank you !

 

DorianL

0 Kudos
6 Replies
aikeu
Employee
184 Views

Hi DorianL,


Can try to check out the optimization reference from the guide below:

https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/2023-1/loops.html


Thanks.

Regards,

Aik Eu


0 Kudos
DorianL
Novice
165 Views

Hi @aikeu

Thank you for your reply, I already checked the documentation and I think the issue is that my inner loop is dynamic because it is an input of my IP (the number of column of the picture to process). I tried to solve the issue by bounding the loop with a random number like it is made in the example of the documentation, but my kernel doesn't accept the rand() function because it's an external SYCL function. Do you have an idea of how I could make my kernel work ? 

Thank you !

 

DorianL

0 Kudos
BoonBengT_Intel
Moderator
102 Views

Hi @DorianL,


Greetings, just to further understand the issues you mention that you have a inner loop issues.

Would you be able to share the code example that you have for the kernels and also what are the error/issues that you are seeing? And is it correct to assume that the error happens on the compilation?

If you can provide the compilation command it would also be very helpful.

That would better help us to understand the situation.


Thanks,

Regards

BB


0 Kudos
DorianL
Novice
77 Views

Hi @BoonBengT_Intel,

Thank your for your answer, here is my code with my kernel. My issue is about the inner loop (line 49) in the kernel that process pixels should have a II of 1 wich is not the case in simulation. I have rdy signals that are not continually high whereas it is indicated in the report that the II should be 1. The issue is in my opinion about the dynamic trip count of this loop because the stop condition of my "for" loop is an input of the kernel and I think that the compiler is considering it as a dynamic input. How can I change that to be a fixed input ?
 
Here is the code of my kernel :
 

 

 

template <typename flux_in, typename flux_tempo>
struct travail_sur_voisinage {

    sycl::ext::oneapi::experimental::annotated_arg<
       int , decltype(sycl::ext::oneapi::experimental::properties{
               stable})>
       taille_h;

     sycl::ext::oneapi::experimental::annotated_arg<
       int , decltype(sycl::ext::oneapi::experimental::properties{
               stable})>
       taille_v;
  
    auto get(sycl::ext::oneapi::experimental::properties_tag) {
        return sycl::ext::oneapi::experimental::properties{
            
                streaming_interface<>};
  }

  void operator()() const {
    //Compteurs ligne pixel

	//Entree Sortie
  [[intel::fpga_register]]
	unsigned int pixel_a_traiter;

  [[intel::fpga_register]]
	unsigned int pixel_a_envoyer;

  
 
  [[intel::fpga_register]]
	unsigned int pixel_apres_traitement;

	//Ligne a retard
  [[intel::fpga_memory("BLOCK_RAM")]]
	unsigned int line_buffer[5][NB_COLONNE_MAX];

	//Voisinnage
  [[intel::fpga_register]]
	unsigned int fenetre[5][5];

    [[intel::initiation_interval(1)]]
    for (int num_lig = 0; num_lig < taille_v + 2; num_lig++)
    {
      
      [[intel::initiation_interval(1)]]
      //[[intel::speculated_iterations(0)]]
    	for (int num_col = 0; (num_col < taille_h + 2); num_col++)
    	{

        if (num_lig < taille_v && num_col < taille_h)
        {

          pixel_a_traiter = flux_in::read();

          //Gestion ligne a retard
  
          fpga_tools::UnrolledLoop<0,4>([&](auto l)
          {
            line_buffer[l][num_col] = line_buffer[l + 1][num_col];
          });
          line_buffer[4][num_col] = pixel_a_traiter;
          
          //Fin gestion ligne a retard

          //Fenetre video glissante

          fpga_tools::UnrolledLoop<0,5>([&](auto li)
          {
              // #pragma unroll

            fpga_tools::UnrolledLoop<0,4>([&](auto co)
            {
              fenetre[li][co] = fenetre[li][co + 1];
            });
            fenetre[li][4] = line_buffer[li][num_col]; 
          });
          //Fin Fenetre video glissante
        }
      pixel_apres_traitement = traitement_5x5(fenetre);
    


      if ((num_lig >= 2) && (num_col >= 2))
      {
        pixel_a_envoyer = 0;
        
        if (((num_lig >= 4) && (num_lig < taille_v) && (num_col >= 4) && (num_col < taille_h)))
        {
          pixel_a_envoyer = pixel_apres_traitement;
        }
        flux_tempo::write(pixel_a_envoyer);
      }
       
  }
		}
	}
      
};

 

 

 

the compilation command is : 

 

 

tp3_video.fpga_sim: kernel_sim.o
icpx -fsycl -fintelfpga -Xsclock=400MHz -Xsoptimize=latency  -Xssimulation -Xsghdl=0 -Xstarget=Agilex7 -Xsv $^ -o $@ 


kernel_sim.o: src/tp3_video.cpp
	icpx -fsycl  -O3 -g -std=c++17 -Wall -I include -v -fintelfpga -Xsprofile -Xssimulation -DFPGA_SIMULATOR -o $@ -c src/tp3_video.cpp

 

 

Have a great day,

 

DorianL

0 Kudos
aikeu
Employee
88 Views

Hi DorianL,


I not sure it is directly related to your application request or not based on your previous feedback but can check out the method below on getting random number:

https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Quick-random-number-on-SYCL-device/td-p/1262724


Thanks.

Regards,

Aik Eu


0 Kudos
DorianL
Novice
77 Views

Hi @aikeu,

 

Thank you for your help, I'll check that now !

 

DorianL

0 Kudos
Reply