I have a fixed multi-cycle custom instruction written in Verilog that I've added to my system in Platform Designer. The instruction can be called in C++ and produces the correct output. The instruction is fully pipelined so that the latency is more than one cycle but the throughput is one instruction per cycle. How do I make use of this when calling the instruction as a function in a tight loop in C++? Is there a way to indicate to the system that it has a 1-cycle throughput?