Community
cancel
Showing results for 
Search instead for 
Did you mean: 
amit_b_
Beginner
54 Views

ICC -O3 generates code which takes more time than -O0

I have been compiling some benchmark with ICC. I am seeing results where optimized version i.e executable generated with -O3 takes more time than the executable generated with -O0. Although when I generate the vectorization report by using flag -vec-report5 I see that compiler chooses to vectorize because

scalar loop cost  : 28

vector loop cost : 7.680

estimated potential speedup: 3.630

 

But when I run the executables then vectorized version takes more time than the nonvectorized executable, even difference is about 10 secs. I just wanted to know that is it really possible as mentioned in the above case, or am I not able to visualize something.

0 Kudos
5 Replies
TimP
Black Belt
54 Views

We would need an actual example of the situation you are questioning. Among the vulnerabilities of icc is not resolving misaligned vector store and reload except when targeting Mic.

jimdempseyatthecove
Black Belt
54 Views

When the loop trip count is unknown at compile time, the compiler may choose to vectorize code that will eventually use small trip counts.

When you know the representative loop counts, consider using

#pragma loop_count min(yourGuessAtMinimum), max(yourGuessAtMaximum), avg(YourGuessAtAverage)

Otherwise, when you have a mix of small and large loops then use

if(n < YourCutoff) {
  #pragma nosimd
  for(...) {...}
else
  #pragma simd
  for(...) {...}
endif

Jim Dempsey

jimdempseyatthecove
Black Belt
54 Views

Perhaps it would be a good feature extension to have:

#pragma simd if(n > Cutoff)

The you would not need to double write the same code statements.

Jim Dempsey

QIAOMIN_Q_
New Contributor I
54 Views

Hello,

It would be more clear if you can provide your sample code or attach the hostpot aera screenshot and the coresponding assembly aera if you got vtune at hand.

 

 

Thank you.
--
QIAOMIN.Q
Intel Developer Support
Please participate in our redesigned community support web site:

User forums:                   http://software.intel.com/en-us/forums/

TimP
Black Belt
54 Views

The 2015 compiler improved handling of many short vectorized loops

Reply