Hi, I'm running a custom analysis with Intel VTune and I'm interested in evaluating the floating point operations of a section of code, using "ittnotify".
I'm using as starting point the "Performance snapshot" analysis with "Analyze user tasks, events and counters" enabled. The benchmarks dynamic instantiate two vectors and compute the multiplication between them. If I use big dimensions the floating point instructions are estimated correctly, but using smaller vectors the metric "FP_ARITH_INS_RETIRED.SCALAR_SINGLE" reports zero as result.
Is there a way to be able to analyze these metrics even for less cpu intensive tasks? I've tried with "-knob sampling-interval=0.01" but does not seems to be effective. Thanks in advance.
Thank you for posting in Intel Communities.
Could you please share the following details:
1. VTune version
2. OS and hardware details
3. Sample reproducer code and commands you followed