Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

ICC -O3 generates code which takes more time than -O0

amit_b_
Beginner
309 Views

I have been compiling some benchmark with ICC. I am seeing results where optimized version i.e executable generated with -O3 takes more time than the executable generated with -O0. Although when I generate the vectorization report by using flag -vec-report5 I see that compiler chooses to vectorize because

scalar loop cost  : 28

vector loop cost : 7.680

estimated potential speedup: 3.630

 

But when I run the executables then vectorized version takes more time than the nonvectorized executable, even difference is about 10 secs. I just wanted to know that is it really possible as mentioned in the above case, or am I not able to visualize something.

0 Kudos
5 Replies
TimP
Honored Contributor III
309 Views

We would need an actual example of the situation you are questioning. Among the vulnerabilities of icc is not resolving misaligned vector store and reload except when targeting Mic.

0 Kudos
jimdempseyatthecove
Honored Contributor III
309 Views

When the loop trip count is unknown at compile time, the compiler may choose to vectorize code that will eventually use small trip counts.

When you know the representative loop counts, consider using

#pragma loop_count min(yourGuessAtMinimum), max(yourGuessAtMaximum), avg(YourGuessAtAverage)

Otherwise, when you have a mix of small and large loops then use

if(n < YourCutoff) {
  #pragma nosimd
  for(...) {...}
else
  #pragma simd
  for(...) {...}
endif

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
309 Views

Perhaps it would be a good feature extension to have:

#pragma simd if(n > Cutoff)

The you would not need to double write the same code statements.

Jim Dempsey

0 Kudos
QIAOMIN_Q_
New Contributor I
309 Views

Hello,

It would be more clear if you can provide your sample code or attach the hostpot aera screenshot and the coresponding assembly aera if you got vtune at hand.

 

 

Thank you.
--
QIAOMIN.Q
Intel Developer Support
Please participate in our redesigned community support web site:

User forums:                   http://software.intel.com/en-us/forums/

0 Kudos
TimP
Honored Contributor III
309 Views

The 2015 compiler improved handling of many short vectorized loops

0 Kudos
Reply