Community
cancel
Showing results for 
Search instead for 
Did you mean: 
velvia
Beginner
51 Views

Compiler vectorize std::vector but not C vector

Hi,

 

I have tested the compiler regarding vectorization and I get the following weird problem. In the following code

double test80_c(double* x, double* y, int n, int nb_loops) {
    double sum{ 0.0 };
    double a{ 0.0 };
    for (int i = 0; i < nb_loops; ++i) {
        a += 1.0;
        for (int k = 0; k < n; ++k) {
            sum += std::sqrt(x*x + y*y + a);
        }
    }
    return sum;
}

double test80_cpp(const std::vector<double>& x, const std::vector<double>& y,
        int nb_loops) {
    double sum{ 0.0 };
    double a{ 0.0 };
    for (int i = 0; i < nb_loops; ++i) {
        a += 1.0;
        for (std::size_t k = 0; k < x.size(); ++k) {
            sum += std::sqrt(x*x + y*y + a);
        }
    }
    return sum;
}

the first version (C one) gets vectorized but not the second one. Note that the following Fortran code

    function test80_f(x, y, n, nb_loops) bind(c)
        use iso_c_binding
        real(dp), dimension(1:n), intent(in) :: x
        real(dp), dimension(1:n), intent(in) :: y
        integer, intent(in) :: n
        integer, intent(in) :: nb_loops
        real(dp) :: test80_f
        ! local variables
        integer :: i, k
        real(dp) :: a

        test80_f = 0.0_dp
        a = 0.0_dp
        do i = 1, nb_loops
            a = a + 1.0_dp
            do k = 1, n
                test80_f = test80_f + sqrt(x(k)**2 + y(k)**2 + a)
            end do
        end do
    end function test80_f

does not get vectorized either. Could you reproduce that on your compiler ? I am using icpc 15.0.0 20140716 under Mac OSX.

Best regards,

Francois

 

0 Kudos
8 Replies
velvia
Beginner
51 Views

Same thing with this code. The C version and the corresponding Fortran version do not get vectorized, but the C++ version does.

double test30_c(double* x, int n, int nb_loops) {
    double sum{ 0.0 };
    double a{ 0.0 };
    for (int i = 0; i < nb_loops; ++i) {
        a += 1.0;
        for (int k = 0; k < n; ++k) {
            sum += x + a;
        }
    }
    return sum;
}

double test30_cpp(std::vector<double>& x, int nb_loops) {
    double sum{ 0.0 };
    double a{ 0.0 };
    for (int i = 0; i < nb_loops; ++i) {
        a += 1.0;
        for (std::size_t k = 0; k < x.size(); ++k) {
            sum += x + a;
        }
    }
    return sum;
}

 

Richard_A_Intel
Employee
51 Views

Francois,

I tried your code on Mac OSX* and I was unable to reproduce your findings.  I tested the code you provided with this command:

icpc test.cpp -O2 -std=c++11 -S -vec-report3

My file, test.cpp, includes your functions: test80_c, test80_cpp, test30_c, and test30_cpp.  In the vectorization report for this code I saw the loops getting vectorized in all functions.  If your question comes from the message: "loop was not vectorized: inner loop was already vectorized" then you can look to the inner loop to see where the vectorization occurred. This is was tested with the Intel® 15.0 Compiler.

Thank you,
Richard

TimP
Black Belt
51 Views

In the current compiler for Mac and Linux ansi-alias has been set as default. You should set it for iCl.
Bernard
Black Belt
51 Views

What was ICL vectorization report ?

Richard_A_Intel
Employee
51 Views

I am attaching the optimization report here.

Bernard
Black Belt
51 Views

@RICHARD A.

I received "page cannot be found " when I tried to download optimization report.

Richard_A_Intel
Employee
51 Views

Sorry about that, not sure why it's not working.  I'll post the contents of the file here:

 

Begin optimization report for: main()

    Report from: Vector optimizations [vec]


LOOP BEGIN at /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/c++/v1/vector(899,20) inlined into test.cpp(60,23)
   remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
   remark #15346: vector dependence: assumed OUTPUT dependence between __end_ line 1685 and __end_ line 897
LOOP END

LOOP BEGIN at /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/c++/v1/vector(899,20) inlined into test.cpp(61,23)
   remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
   remark #15346: vector dependence: assumed OUTPUT dependence between __end_ line 1685 and __end_ line 897
LOOP END

LOOP BEGIN at /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/c++/v1/vector(442,5) inlined into test.cpp(68,1)
   remark #15414: loop was not vectorized: nothing to vectorize since loop body became empty after optimizations
LOOP END

LOOP BEGIN at /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/c++/v1/vector(442,5) inlined into test.cpp(68,1)
   remark #15414: loop was not vectorized: nothing to vectorize since loop body became empty after optimizations
LOOP END
===========================================================================

Begin optimization report for: test30_c(double *, int, int)

    Report from: Vector optimizations [vec]


LOOP BEGIN at test.cpp(7,5)
   remark #15542: loop was not vectorized: inner loop was already vectorized

   LOOP BEGIN at test.cpp(9,9)
   <Peeled>
   LOOP END

   LOOP BEGIN at test.cpp(9,9)
      remark #15399: vectorization support: unroll factor set to 8
      remark #15300: LOOP WAS VECTORIZED
      remark #15442: entire loop may be executed in remainder
      remark #15448: unmasked aligned unit stride loads: 1 
      remark #15475: --- begin vector loop cost summary ---
      remark #15476: scalar loop cost: 10 
      remark #15477: vector loop cost: 28.000 
      remark #15478: estimated potential speedup: 2.580 
      remark #15479: lightweight vector operations: 6 
      remark #15480: medium-overhead vector operations: 1 
      remark #15488: --- end vector loop cost summary ---
   LOOP END

   LOOP BEGIN at test.cpp(9,9)
   <Remainder>
      remark #15301: REMAINDER LOOP WAS VECTORIZED
   LOOP END

   LOOP BEGIN at test.cpp(9,9)
   <Remainder>
   LOOP END
LOOP END
===========================================================================

Begin optimization report for: test30_cpp(std::__1::vector<double, std::__1::allocator<double>> &, int)

    Report from: Vector optimizations [vec]


LOOP BEGIN at test.cpp(19,5)
   remark #15542: loop was not vectorized: inner loop was already vectorized

   LOOP BEGIN at test.cpp(21,39)
   <Peeled>
   LOOP END

   LOOP BEGIN at test.cpp(21,39)
      remark #15399: vectorization support: unroll factor set to 8
      remark #15300: LOOP WAS VECTORIZED
      remark #15442: entire loop may be executed in remainder
      remark #15448: unmasked aligned unit stride loads: 1 
      remark #15475: --- begin vector loop cost summary ---
      remark #15476: scalar loop cost: 9 
      remark #15477: vector loop cost: 28.000 
      remark #15478: estimated potential speedup: 2.350 
      remark #15479: lightweight vector operations: 6 
      remark #15480: medium-overhead vector operations: 1 
      remark #15488: --- end vector loop cost summary ---
   LOOP END

   LOOP BEGIN at test.cpp(21,39)
   <Remainder>
      remark #15301: REMAINDER LOOP WAS VECTORIZED
   LOOP END

   LOOP BEGIN at test.cpp(21,39)
   <Remainder>
   LOOP END
LOOP END
===========================================================================

Begin optimization report for: test80_c(double *, double *, int, int)

    Report from: Vector optimizations [vec]


LOOP BEGIN at test.cpp(31,5)
   remark #15542: loop was not vectorized: inner loop was already vectorized

   LOOP BEGIN at test.cpp(33,9)
   <Peeled>
   LOOP END

   LOOP BEGIN at test.cpp(33,9)
      remark #15399: vectorization support: unroll factor set to 4
      remark #15300: LOOP WAS VECTORIZED
      remark #15442: entire loop may be executed in remainder
      remark #15448: unmasked aligned unit stride loads: 2 
      remark #15475: --- begin vector loop cost summary ---
      remark #15476: scalar loop cost: 62 
      remark #15477: vector loop cost: 92.000 
      remark #15478: estimated potential speedup: 2.620 
      remark #15479: lightweight vector operations: 13 
      remark #15480: medium-overhead vector operations: 1 
      remark #15488: --- end vector loop cost summary ---
   LOOP END

   LOOP BEGIN at test.cpp(33,9)
      remark #25460: No loop optimizations reported
   LOOP END

   LOOP BEGIN at test.cpp(33,9)
   <Remainder>
      remark #15301: REMAINDER LOOP WAS VECTORIZED
   LOOP END

   LOOP BEGIN at test.cpp(33,9)
   <Remainder>
   LOOP END
LOOP END
===========================================================================

Begin optimization report for: test80_cpp(const std::__1::vector<double, std::__1::allocator<double>> &, const std::__1::vector<double, std::__1::allocator<double>> &, int)

    Report from: Vector optimizations [vec]


LOOP BEGIN at test.cpp(44,5)
   remark #15542: loop was not vectorized: inner loop was already vectorized

   LOOP BEGIN at test.cpp(46,39)
   <Peeled>
   LOOP END

   LOOP BEGIN at test.cpp(46,39)
      remark #15399: vectorization support: unroll factor set to 4
      remark #15300: LOOP WAS VECTORIZED
      remark #15442: entire loop may be executed in remainder
      remark #15448: unmasked aligned unit stride loads: 2 
      remark #15475: --- begin vector loop cost summary ---
      remark #15476: scalar loop cost: 60 
      remark #15477: vector loop cost: 92.000 
      remark #15478: estimated potential speedup: 2.540 
      remark #15479: lightweight vector operations: 13 
      remark #15480: medium-overhead vector operations: 1 
      remark #15488: --- end vector loop cost summary ---
   LOOP END

   LOOP BEGIN at test.cpp(46,39)
      remark #25460: No loop optimizations reported
   LOOP END

   LOOP BEGIN at test.cpp(46,39)
   <Remainder>
      remark #15301: REMAINDER LOOP WAS VECTORIZED
   LOOP END

   LOOP BEGIN at test.cpp(46,39)
   <Remainder>
   LOOP END
LOOP END
===========================================================================

 

LexiS_Intel
Moderator
51 Views

Test post, please ignore.