Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Help with: loop was not vectorized: unsupported loop structure

joan_puig
Beginner
559 Views
The attached source code contains one loop inside a memeber function of a Matrix class and the exact same loop without being inside a class. One of them gets vectorized and the other one does not. Because there is a significant speed difference I was wondering if there is anything I can do to keep the "good" object orientation, and the performance at the same time.







Thanks







Joan







Compilation output:



[jpuig@new-host snipets]$ /opt/intel/cc/9.0/bin/icc compilerTest3.cxx -ipo -O3 -xN -vec-report3 -opt-report-levelmax -parallel; ./a.out



IPO: performing single-file optimizations



IPO: generating object file /home/jpuig/tmp/ipo_iccYuQuxV.o



compilerTest3.cxx(101) : (col. 4) remark: loop was not vectorized: unsupported loop structure.



compilerTest3.cxx(107) : (col. 2) remark: LOOP WAS VECTORIZED.







Execution times of the loops:



Elapsed time: 0.907152 seconds



Elapsed time: 0.73452 seconds







Compiler info:



[jpuig@new-host snipets]$ /opt/intel/cc/9.0/bin/icc -V



Intel C Compiler for 32-bit applications, Version 9.0 Build 20050809Z Package ID:



Copyright (C) 1985-2005 Intel Corporation. All rights reserved.



FOR NON-COMMERCIAL USE ONLY

Message Edited by gte237z@mail.gatech.edu on 01-06-2006 07:17 AM

0 Kudos
4 Replies
Intel_C_Intel
Employee
559 Views

Dear gte,

This is unfortunately an issue with the conservative assumptions the vectorizer sometimes has to make. One way of enabling vectorization is to make local copies of the upper bound and the pointer, as in:

inlinevoid setAllElements(double value){
int lup = numel;
double *ldat = data;
for (int i = 0; ildat = value;
}
};

When I am back from my sabbatical, I hope to have a look at how the compiler analysis itself can be further improved to minimize the amount of rewriting required.

Hope this helps you for now.

Aart Bik
http://www.aartbik.com/

0 Kudos
joan_puig
Beginner
559 Views
Hi Aart, Thanks for the hint, I was able to get the loop vectorized. The bad news is that I still see a significant performance hit: (attached the new version of the source) I added an outer loop so that I can time several executions of the same code. As expected, the numbers without an outer loop are not stable, but the same trend appears. /opt/intel/cc/9.0/bin/icc compilerTest3.cxx -ipo -O3 -xN -vec-report3 -opt-report-levelmax; ./a.out IPO: performing single-file optimizations IPO: generating object file /home/jpuig/tmp/ipo_iccUjhhrH.o compilerTest3.cxx(108) : (col. 2) remark: loop was not vectorized: not inner loop. compilerTest3.cxx(110) : (col. 5) remark: LOOP WAS VECTORIZED. compilerTest3.cxx(114) : (col. 3) remark: LOOP WAS VECTORIZED. Elapsed time: 6.70821 seconds Elapsed time: 2.59897 seconds
0 Kudos
Intel_C_Intel
Employee
559 Views

Dear gte,

Because the compiler can statically determine the trip-count for the C-style loop, nontemporal stores are generated automatically, yielding better performance. If you add a #pragma vector nontemporal to the C++-style loop, then setAllElements() performs equally well.
Aart Bik
http://www.aartbik.com/

Message Edited by abik on 01-09-2006 07:08 PM

0 Kudos
joan_puig
Beginner
559 Views
Dear Aart:

Thanks a lot for your help, this little tweaks will have very big performance implications throughout the code I write.


Thanks,
Joan
0 Kudos
Reply