The floating-point arithmetic counters are disabled on Haswell processors -- perhaps because of accuracy problems with these events on Sandy Bridge and Ivy Bridge cores. One of these days I will find a Broadwell or Skylake system to check the new floating-point counters that have been added on those processors.
Theoretically you can count AVX machine code instructions which are executed and that way you asses how many GFLOPs were achieved. Of course this will be crude approximation of real result.
Yes I know that. For simple loops like accumulation of variable it will work, but for complex nested loops with embedded control statement it will be hard and not accurate task.