Do you know that this code gets vectorized because you've seen a vectorization report in the compilation log? The VTune analyzer data you published, albeit flawed, suggests otherwise.
Have you looked at any of the documentation that comes with the VTune analyzer help file? There are whole pages talking about the sampling method, which uses a feature of the Intel processor performance monitoring registers to limit the frequency of sample collection in a manner that still reflects the frequency of the event, in order to limit the impact of the collection on the program(s) under test. For hot spot sampling, the SAV is nominally picked to cause interrupts around once per millisecond.
There is a dialog in the collector configuration that lets you set SAVs for each event. You can also look there to see what is the current setting.