Showing results for 
Search instead for 
Did you mean: 
Honored Contributor I

Non-Kernel Functions called from a Kernel

I'm looking to better understand how the offline compiler handles function calls. For example, I have a function definition: 

void computeValue(specialStruct *mystruct, ushort *answer); 


I'm calling it from a while loop / case statement in a single work item kernel. The loop/case statement is intended to work like a synchronous state machine. The ii of the loop is being reported as 2, and I suspect it has something to do with calling the function. I've tried several structures of passing by reference / value or returning the answer versus assigning it to a passed parameter. 


Could anyone point me to any documentation or reference that talks about calling functions from device kernels? Or building a state machine in openCL? I'd like to get the ii back to 1, and have the while loop essentially stall when the function is called until the value is returned. Any suggestions welcome. 


__kernel __attribute__((task)) void dostuff(.....) { structType mystruct; ushort answer; while(someCondition) { switch(currentState): case(somevalue1): ..... if (abc) currentState = somevalue1; else currentState = somevalue2; break; case(somevalue2): computeValue(&mystruct, &answer); if (answer) currentState = somevalue1; else currentState = somevalue2; break; } }
0 Kudos
2 Replies
Honored Contributor I

There is no such thing as "function calling" on FPGAs and hence, there is no calling overhead. When you have a separate function that is called in the kernel, the compiler will insert a copy of the function in place of every call. The reason why your II goes up to two must be something else. Can you archive and attach your reports folder? 


Furthermore, I don't think it is possible to infer state machines in OpenCL since the compiler pipelines everything; e.g. all if/else or switch/case statement are implemented as multiple same-latency pipelines, each implementing one of the branches, and a MUX at the end to choose the correct output. 


You might be able to achieve your intended behavior using multiple kernels and blocking/non-blocking channel calls that will allow you to have implicit stalling/synchronizing when the channel is empty/full.
Honored Contributor I

Thanks for your reply HRZ! The compiler inserting a copy of the function in place (like a macro) makes a lot of sense from the compiler standpoint. I probably can't archive and attach the full reports folder. Right now I can force the ii to 1 by using "#pragma ii 1" and take the fmax hit. I'm avoiding doing that in hopes of writing the loop that is easier for the compiler to pipeline. As I add more logic I think I'll get to a place where forcing the ii to 1 errors out. 


Thanks for the information about how the case is implemented in logic. I guess that could have the same affect I'm looking for, with the state variable acting like the mux selection. Unfortunately that's probably inferring a larger and larger mux every time I add a state - driving up the ii if the first stage compiler decides it's better to up the ii rather than decrease the fmax. I may also be inferring latches if I don't explicitly assign each output in every case. 


Your suggesting of using multiple kernels and using blocking channel calls to stall might be the way to go to stall the loop effectively