Intel® High Level Design
Support for Intel® High Level Synthesis Compiler, DSP Builder, OneAPI for Intel® FPGAs, Intel® FPGA SDK for OpenCL™
676 Discussions

Deadlock while filling pipe for simulation

Ric2
Beginner
419 Views

Hi, following best practice as descibed here we run into trouble with stalled writes. 

 

When you create a testbench for a oneAPI kernel that you intend to compile as an IP core, write all your data to the host pipe before invoking the kernel.

 

There is a reproducer attached based on streaming data interface example.

As reducing the amount of data to process for simulation seems an easy workaround, this will not work if the design is more complex or bigger and the interface width is larger. Both conditions reduce the number of data samples before the buffer access stalls.

For our relevant design this means a too small size of simulateable data to get reasonable simulation results.

 

I have following questions:

1) Is this intented behaviour?

2) Is there a parameter to increase the accepted number of samples for simulation pipes?

3) Will this solution solve the simulation issue too?

 

Thanks for any feedback!

Ric.

 

oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308), Ubuntu 22.04.4 LTS

0 Kudos
7 Replies
Ric2
Beginner
378 Views

Having modified the write command to be non-blocking I notice that the command still blocks (the if condition is never satisfied):

      bool success = false;
      InPixelPipe::write(q, in_beat, success);
      if (!success){
        sleep(1);
        std::cerr << i << "### STALL ###\n";  
      } 

Documentation reads this:

Non-blocking writes add a bool argument in both host and device APIs that is passed by reference and returns true in this argument if the write was successful, and false if it was unsuccessful.

On the host:
// attempt non-blocking write from host to pipe until successful
while (!success) MyPipeInstance::write(q, data_element, success);

Remember, it's while running simulation. In emulation, the pipe is never stalling.

Btw, with i++ HLS we didn't run in this problem, even with very large simulation data sets.

 

Any advice? Thanks!

0 Kudos
Kevin_Xu_Intel
Employee
305 Views

Hello,

I was able to reproduce your issue, I think there is a buffer limit to host pipe in simulation, and since the host testbench is writing all of its data into the buffer before processing it, it stalled when the limit was hit. 

 

I found that a work around for this is to perform the host pipe read and write in separated threads. 

 

Attached design use pthread for threading the host testbench, and it should not effecting the IP you are working on.

 

Thanks,

0 Kudos
Ric2
Beginner
269 Views

Hi Kevin,

thanks for testing and proposing a work around. Unfortunatelly it helps for half of the problem only. The buffer depth limitation can be addressed this way.

However, as soon as we had implemented the changes we run into trouble again seeing this error massage sporadically:

 

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Aborted (core dumped)

 

Digging into the issue we found that the non-blocking pipe read causes the error as soon as an empty pipe is accessed to read.

To reproduce just generate e.g. 9 write samples and try to read 10.

With the print modification from below the output should look like this:

 

Running on device: Intel(R) FPGA Emulation Device
Input: 0         9
Input Done!
try read: 0      Output: 0       0      success: 1       
try read: 1      Output: 1       1      success: 1       
try read: 2      Output: 2       2      success: 1       
try read: 3      Output: 3       3      success: 1       
try read: 4      Output: 4       4      success: 1       
try read: 5      Output: 5       5      success: 1       
try read: 6      Output: 6       6      success: 1       
try read: 7      Output: 7       7      success: 1       
try read: 8      Output: 8       8      success: 1       
try read: 9      terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Aborted (core dumped)
while (!read_success){ 
    std::cerr << "try read: " << i << "\t "; 
    out_beat = OutPixelPipe::read(*(out_args->q), read_success);
    std::cerr << "Output: " << i << "\t " << out_beat.data << "\tsuccess: " << (int)read_success << "\t " << std::endl;
} 

 

Remember my second post here where some odd behaviour was reported for the non-blocking write command too.

 

Thanks for your help!

Ric.

 

0 Kudos
Kevin_Xu_Intel
Employee
251 Views

Hi Ric,

 

I did some testing, it dose looks like in emulation, a non-blocking read of a empty pipe causes crash. I will look into this problem. 

 

Dose the crash happens to you in simulation? I was not able to reproduce the crash in simulation.

 

Thanks,

 

-Kevin

0 Kudos
Ric2
Beginner
160 Views

Hi Kevin,

I guess there are race conditions. I sporadically see issues in simulation too:

 

.../build$ ./streaming.fpga_sim
Running on device: SimulatorDevice : Multi-process Simulator (aclmsim0)
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Enqueue process failed. -59 (PI_ERROR_INVALID_OPERATION)
Aborted (core dumped)

 

Please check the host pipe write command in your code. Modify the core to not accept data at it's input for a longer time (backpressure condition). Then the host will throw an error when the pipe is full but the host write is not handling this properly.

To reproduce, just modify line 57:

 

57   // while (!end_of_packet) {
57    while (end_of_packet) {
58      // Read in next pixel

 

As the reproducer is a bit of artificial, I see this issue in our production design too.

I hope this helps you to find the root cause of these problems!

Regards, Ric.

 

0 Kudos
Kevin_Xu_Intel
Employee
135 Views

Hi Ric,

 

I was able to reproduce this error. But i think it might be due to different reason. 

 

If we change the while loop condition to be never to enter it, the compiler would optimized it way.

 

If I change the while condition to:

    // while (!end_of_packet) {
    for (int i = 0; i < 1; i ++){

 To allows one packet to go though, the error goes away.

Do you have a different reproducer for the sporadically write issue?

 

Thanks,

 

-Kevin

 

0 Kudos
Ric2
Beginner
69 Views

Hi Kevin, you are right, the very simple reproducer did not show the effect. Same for the simple i++ loop as this will end the component without having consumed more data from the input pipe at a later time.

The updated reproducer is consuming input data with a large gap. I designed the gap to be large enougth to show that:
1) TB is pushing data to the input pipe until the related tread stalls as the write command is not correctly signalling FULL (the WR STALL print is never shown).
2) Simulation continuosly generates output data, restarts consuming input data after the gap but comes to a dead end as the input pipe is corrupted.
3) The final error comes from vsim but I guess the root cause is in the TB WR handling.

WR ... Input: 17000      Output: 51      Diff: 16949
WR ... Input: 18000      Output: 60      Diff: 17940
^CExiting simulation due to Interrupt
vsimk: src/hls_cosim_ipc_socket.cpp:133: virtual void IPCSocket::send(const void*, int): Assertion `0 && "send() failed"' failed.

Expected behavioural is that the TB WR thread hold on as long as WR pipe is full but restarts pushing data to it when the kernel has consumed some.

Regards, Ric.

Ric2_0-1718357943500.png

 

0 Kudos
Reply