How to tell ICC to vectorize basic blocks?

Patrick_K_ · ‎01-16-2014

Please note, that this is a cross post from StackOverflow: http://stackoverflow.com/questions/21135281/how-to-make-the-intel-c-compiler-icc-vectorize-basic-blocks

I am currently using icc (version 13.1.0.146) to compile C programs running in native mode on the Intel Xeon Phi coprocessor.

Consider the following two code fragments:

    // fragment 1
    array[pos]     += 1;
    array[pos + 1] += 1;
    array[pos + 2] += 1;
    array[pos + 3] += 1;

    // fragment 2
    for (int i = 0; i < 4; ++i)
        array += 1;

Unfortunately, only the loop is vectorized automatically. However, if i compile for the x86 platform, icc also vectorizes the "unrolled" version.

Is there a way to tell icc to vectorize basic blocks when compiling for the Xeon Phi, too?

Any help is appreciated. Thanks in advance!

Kevin_D_Intel · ‎01-17-2014

Developer’s guidance: Loop materialization (which leads to vectorization of fragment 1 for Xeon) is intentionally disabled for Xeon Phi™ because it tends to create loops with a small number of iterations that are not profitable for Xeon Phi™ and vectorization of such inner most loops disable vectorization at the outer level (that have better profitability potential).

For Xeon Phi™, the recommendation is to use explicit vector programming constructs ---- OpenMP 4.0 SIMD and CilkPlus (array notation, simd pragma, simd-enabled function).

I found fragment 1 vectorizes using array notation: array[pos:4:1] +=1 ;

Manual unrolling is also not well suited for vectorization (refer to the Compiler Methodology - Avoid Manual Loop Unrolling) and there are no pragmas provided to vectorize straight line code (although some uses of IVDEP on top of straight line code can encourage loop materialization for Xeon.) to help encourage programmers to use vector programming constructs.

Hope that helps.