In Intel Vtune Amplifier profiler, there is no counter for how many instructions execute on Integrated GPUs.
Instead, the profiler provide three metrics indicating the ratio of EU in state active, stall and idle.
So if my kernel (written in OpenCL) is highly divergent and the divergence is input dependent, it is difficult to measure the GFLOPS,
Unfortunately this is not possible in the current version of Vtune. I have forwarding this info to the Vtune team and they will consider this for future.