On Linux, using Intel Advisor Update 1 (build 435553) and following the tutorial available here : https://software.intel.com/en-us/videos/data-alignment-padding-and-peel-remainder-loops , I don't get any peeled loop even though the OFFSET is set to 16 which makes the array unaligned. The length of the loop is 35 which gives you 4 peeled elements at the beginning which should show up.
In fact, I never seem to get anything for peeled loops. Have they been merged to "remainder" loops ?
I should have give more thoughts. Here is the solution:
- On my examples, the loop count was small enough so the compiler decided to skip loop peeling and use unaligned loads
I was fooled by the fact that when you generate -qopt-report=3, you still get "<Peeled loop for vectorization>" even though it is not generated. Should it be considered as a bug?
Are you using update 1 of the 2016 Intel compiler? I've tested against the latest release of Advisor XE and it shows the peeled loop for the tutorial code. I'll continue to investigate the optimization report.
I am using the the 2016 compiler with update 1.
I just found out that the peeled loop appears in Advisor only if the path it taken. It turns out that my implementation seems to align the arrays to the SIMD width even though I did not ask for it which is the reason why the peeled loops did not appear. I find it back when I explicitly unalign the data.