I am trying to vectorize a loop and have generated the optrpt for a piece of code that looks like this.
real (kind=RKIND), dimension(:,:), pointer, contiguous :: layer, layerEdge
integer :: cell1, cell2, nEdges, i, k
!$omp do schedule(runtime) private(cell1, cell2, k) do i = 1, nEdges cell1 = numCellsEdge(1,i) cell2 = numCellsEdge(2,i) !DIR$ IVDEP do k = 1, maxEdge(i) layerEdge(k,i) = 0.5_RKIND * (layer(k,cell1) + layer(k,cell2)) end do end do !$omp end do
If I look at the optrpt, it shows the loop as being multiversioned. But both the versions look exactly the same (attached txt file).
My query is why would this be multiversioned only to produce two exact same versions of the vectorization being done for the loop? Is there any benefit for doing so, please clarify.
The attached file is the vectorization report and does not show the code generated. Multi-versioning will (should) produce multiple (two here) different code sequences for different instruction sets. In this case the two code paths are reported as anticipating the same benefit *** when compared using the same alternate code paths elsewhere.
Only an examination of the disassembly code will tell as to if the two code paths are generated and/or differ.
If the two code paths contain the exact same instruction sequences, then this is something the compiler optimization team should address such to remove redundant code and the test for which code path to take. The will require a simple reproducer.
Intel - suggestion: Does the vectorization report contain in header information as to what instruction set is used for each Multiversioned vn?
This information might be useful at the point in the report where you specify the Multiversioned...
The 17.0 compilers report compiler options in opt-report, but I think Advisor is required for reporting of instruction sets actually used (at run time). So, as Jim suggested, you probably need the -S option to see if the multiple paths are identical.
I've taken a break for many months from asking that the compiler not generate counter-productive multiple paths, since I haven't seen situations lately where this could be shown as detrimental. In the past, it was sometimes said that such compiler actions might be taken if they were needed for some other architecture target, even though they aren't needed on the one in question (which I didn't see specified).
I noticed that 17.0 update 1 dropped some esoteric SSE4 optimizations, without losing any performance on AVX2.