Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

How should I do optimizations that speed and memory access cycles associated with Intel C++ compiler

Ömer_Faruk_Kalkan
645 Views

My program have include a lot of loop with memory access. Now I use optimization that is O2 Maximize Speed. But should I use O3 Highest optimization instead. Also what else can I do adjustments

0 Kudos
1 Solution
TimP
Honored Contributor III
645 Views
-O3 adds mainly optimizations for multiple level loops, at possible expense of increased size of generated code. You could see what is added for your application by comparing compiler reports e.g. -Qopt-report-file=source.txt -Qopt-report4. Those reports are invaluable to show existence and nature of compiler optimizations applied to your critical loops. The exact meaning of the numeric suffix on opt-report varies with compiler version. As you've no doubt read elsewhere, you should start by analysis to determine the location and nature of any performance bottlenecks. ICL offers -Qprofile-loops option for such purposes. recent VTune profiles have made great improvements in the general analysis category.

View solution in original post

0 Kudos
6 Replies
TimP
Honored Contributor III
646 Views
-O3 adds mainly optimizations for multiple level loops, at possible expense of increased size of generated code. You could see what is added for your application by comparing compiler reports e.g. -Qopt-report-file=source.txt -Qopt-report4. Those reports are invaluable to show existence and nature of compiler optimizations applied to your critical loops. The exact meaning of the numeric suffix on opt-report varies with compiler version. As you've no doubt read elsewhere, you should start by analysis to determine the location and nature of any performance bottlenecks. ICL offers -Qprofile-loops option for such purposes. recent VTune profiles have made great improvements in the general analysis category.
0 Kudos
Ömer_Faruk_Kalkan
645 Views

I'm using VTune already. But where is the optimizations reports ? and how to use -Qprofile-loops option ?

 

0 Kudos
TimP
Honored Contributor III
645 Views

If you use Visual Studio GUI I suppose you must add the opt-report and profile-loops options in the additional command line options, and perhaps examine the results in a text editor.

If you want the opt-report results to appear in your build log, of course you will omit the opt-report-file option, but in my opinion it will be more difficult to compare to view the effect of changing your compile options and source code.

Are you trying to get by without the user guide?

0 Kudos
Ömer_Faruk_Kalkan
645 Views

I examining user guide. I add /Qprofile-loops:all and /Qopt-report-file:$(IntDir)$(TargetName).rep  in compiler command lines. I setup following way

a.PNG

I have a file that ParallelSearch.rep but I did not see any log for profile-loops and diagnostic file .

.diag

icl: command line warning #10333: Loop profiler cannot be used when generating parallel code. Disabling '/Qprofile-loops'

.rep

<;-1:-1;IPO UNREFERENCED VAR REMOVING;;0>
  UNREF VAR REMOVAL ROUTINE-SYMTAB (....)

  UNREF VAR REMOVAL ROUTINE-SYMTAB (....)

  UNREF VAR REMOVAL ROUTINE-SYMTAB (....)

  UNREF VAR REMOVAL ROUTINE-SYMTAB (_main):VARS(8),PACKS (8)

 

I did not understand anything. What needs to be analyzed to ? 

0 Kudos
TimP
Honored Contributor III
645 Views
It's probably good to begin profiling and vectorization optimizations with threaded parallelization off. As you're using VTune, you probably don't need the profile-loops, but it's easy to be misled when starting out in VTune with parallelization. Learning the opt-report stuff is particularly important with parallelization.
0 Kudos
QIAOMIN_Q_
New Contributor I
645 Views

As the warning says 'when generating parallel code. Disabling '/Qprofile-loops'' ,since instrumentation calls inserted at a function's entry and exit points, and before and after instrumentable loops may not work well in parallel context and make it's hard to get analyzed.

0 Kudos
Reply