Measuring offload processing time with clock_gettime() and SCIF API

Jun_Hyun_S_ · ‎06-14-2015

Hi, I recently built an app that sends data to MIC, process them, and return them.

I implemented the whole thing with just pthreads to get as much transparency as possible.

Problem is, I'm not sure I'm measuring the offload latency right.

I currently built it so that it take 4 timestamps:

offload begin (from host) - (scif transfer) - remote processing begin (from mic) - (actual processing) - remote processing end (from mic) - (scif transfer back to host) - offload end (from host)

Would using clock_gettime() with one of the parameters (CLOCK_MONOTONIC_RAW, CLOCK_MONOTONIC, CLOCK_REALTIME)

each time get me a correct, consistent measurement across host and device with microseconds precision?

If so, which parameter of the three should I use?

Frances_R_Intel · ‎06-18-2015

I wouldn't specify any of these options, but if you really want to set one, I would try CLOCK_REALTIME. Comparing times between the host and the coprocessor is problematic at best - the clock on the coprocessor is set from the host when the card is booted but does not continually check to be sure the clocks are in sync - but comparing clock ticks between any two different systems is always problematic, given that the clock rate is not necessarily the same.

Personally, I would set the OFFLOAD_REPORT environment variable - 2 would be a good value in this case - rather than putting in my own timing statements. It gives you the host time from start of offload to end of offload and the coprocessor cpu time - in other words, compares host clock to host clock and coprocessor clock to coprocessor clock.

Finally, remember that the difference in time between starting the offload process and starting execution on the coprocessor and the time between finishing execution on the coprocessor and exiting the offload process includes overhead. I used to think that the overhead was small enough to ignore when computing data transfer rates but now I am wiser.