I looked into some of these reports. It seems that it means simply that at least 25% of the samples fell in the remainder loop (for an Intel compilation with vectorization). I looked into one with VTune and that sampling looked more believable, with a few samples falling in the vector remainder loop and none in the scalar remainder where Advisor complained. Is a method needed to give Advisor better granularity for short runs or runs with time distributed over many vectorized loops?
If you use "Basic Hotspots" analysis in VTune, it uses the same collection technology as Advisor. Yes, you can control "sampling interval" in both VTune and Advisor to make more precise measurements. You can find it in project properties.
If you use Advanced Hotspots in VTune, collection is different and default granularity is finer-grained. Also, do you see run to run reproducibility in VTune results? What exact analysis type and options do you use? What is CPU time of your loop?
Has spam filter blocking been fixed? Until today, it was not possible to reply on this forum. I've wasted my breath for years suggesting that Intel should not use these broken methods but use well accepted ones like requiring a working email account for registration.
I try to set sample interval to 4 ms, where "inefficient" runs show 16ms spent in remainder loop. Apparently, that isn't high enough to be significant. At smaller sampling intervals, Advisor appears to interfere with application performance.
I used to set VTune to minimum expected duration, but today it crashes at that setting. Still, VTune seemed to give adequate repeatability, both with advanced hotspots and general.
I'm familiar with the need to set high expected duration when running 60 or more threads, but here I'm running only 2 threads for 6 seconds max. I suppose both the number of threads and setting VTune to use multiple runs count against expected duration.