How to get the latency of arithmetic operations and global memory access?

Altera_Forum · ‎04-06-2014

Hi, All,

Do any Altera OpenCL have any document to claim the the latency of global memory access, local memory access, and arithmetic operations, such as floating point (adder, multiplier, divider, and sqrt) and int (multiplier, divider, and sqrt)? Thanks.

Altera_Forum · ‎04-08-2014

The short answer is no. The longer answer is that some of the latencies are not fixed and are instead kernel specific so there is no value to publish. Also if you have an algorithm that is susceptible to latency then the compiler does everything that it can to hide the latencies since users are not supposed to worry about this. For example you mentioned floating point operators, the kernel hides the latency through those operators by scheduling multiple work-items through that hardware to keep the pipeline full. So if the operator took 32 cycles to complete then the compiler just needs to schedule at least 32 items through that hardware to hide the latency (keep in mind it's a deep pipeline so other operations are occuring concurrently). GPUs do something similar where they schedule work-items in warps/wavefronts so that by the time the result is needed the work-items become scheduled again to execute the next instruction.