Optimization ------------ -O1 optimize for maximum speed, but disable some optimizations which increase code size for a small speed benefit -O2 optimize for maximum speed (DEFAULT) -O3 optimize for maximum speed and enable more aggressive optimizations that may not improve performance on some programs -O same as -O2 -Os enable speed optimizations, but disable some optimizations which increase code size for small speed benefit -O0 disable optimizations -Ofast enable -O3 -no-prec-div -fp-model fast=2 optimizations -fno-alias assume no aliasing in program -fno-fnalias assume no aliasing within functions, but assume aliasing across calls -f[no-]builtin disable inline expansion of intrinsic functions -fno-builtin- disable the intrinsic -ffunction-sections separate functions for the linker (COMDAT) -fdata-sections place each data item into its own section -f[no-]defer-pop disable optimizations which may result in deferred clearance of the stack arguments -nolib-inline disable inline expansion of intrinsic functions -f[no-]optimize-sibling-calls Optimize sibling and tail recursive calls. Enabled at levels -O2, -O3, -Os. -f[no-]protect-parens enable/disable(DEFAULT) a reassociation optimization for REAL and COMPLEX expression evaluations by not honoring parenthesis -qsimd-honor-fp-model enforces the selected fp-model in SIMD loops. Specify -qno-simd-honor-fp-model(DEFAULT) to override the fp-model in SIMD loops. "%description_end %end %id help_option_simd_serialize_fp_reduction %language all %compiler il0 %architecture em64t ia32 lrb %linux_mac_option -qsimd-serialize-fp-reduction %windows_option /Qsimd-serialize-fp-reduction %linux_mac_substitute_string1 -qno-simd-serialize-fp-reduction %windows_substitute_string1 /Qsimd-serialize-fp-reduction- %description serializes FP reductions for improved floating point consistency in SIMD loops while allowing the rest of the loop to be vectorized. Default is -qno-simd-honor-fp-model -qsimd-serialize-fp-reduction serializes FP reductions for improved floating point consistency in SIMD loops while allowing the rest of the loop to be vectorized. Default is -qno-simd-serialize-fp-reduction Code Generation --------------- -x generate specialized code to run exclusively on processors indicated by as described below SSE2 May generate Intel(R) SSE2 and SSE instructions for Intel processors. Optimizes for the Intel NetBurst(R) microarchitecture. SSE3 May generate Intel(R) SSE3, SSE2, and SSE instructions for Intel processors. Optimizes for the enhanced Pentium(R) M processor microarchitecture and Intel NetBurst(R) microarchitecture. SSSE3 May generate Intel(R) SSSE3, SSE3, SSE2, and SSE instructions for Intel processors. Optimizes for the Intel(R) Core(TM) microarchitecture. SSE4.1 May generate Intel(R) SSE4 Vectorizing Compiler and Media Accelerator instructions for Intel processors. May generate Intel(R) SSSE3, SSE3, SSE2, and SSE instructions and it may optimize for Intel(R) 45nm Hi-k next generation Intel Core(TM) microarchitecture. SSE4.2 May generate Intel(R) SSE4 Efficient Accelerated String and Text Processing instructions supported by Intel(R) Core(TM) i7 processors. May generate Intel(R) SSE4 Vectorizing Compiler and Media Accelerator, Intel(R) SSSE3, SSE3, SSE2, and SSE instructions and it may optimize for the Intel(R) Core(TM) processor family. AVX May generate Intel(R) Advanced Vector Extensions (Intel(R) AVX), Intel(R) SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel(R) processors. CORE-AVX2 May generate Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel(R) processors. CORE-AVX-I May generate Intel(R) Advanced Vector Extensions (Intel(R) AVX), including instructions in Intel(R) Core 2(TM) processors in process technology smaller than 32nm, Intel(R) SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel(R) processors. ATOM_SSE4.2 May generate MOVBE instructions for Intel(R) processors, depending on the setting of option -minstruction. May also generate Intel(R) SSE4.2, SSE3, SSE2, and SSE instructions for Intel processors. Optimizes for Intel(R) Atom(TM) processors that support Intel(R) SSE4.2 and MOVBE instructions. ATOM_SSSE3 May generate MOVBE instructions for Intel(R) processors, depending on the setting of option -minstruction. May also generate Intel(R) SSSE3, SSE3, SSE2, and SSE instructions for Intel processors. Optimizes for the Intel(R) Atom(TM) processor that support Intel(R) SSE and MOVBE instructions. MIC-AVX512 May generate Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) Foundation instructions, Intel(R) AVX-512 Conflict Detection instructions, Intel(R) AVX-512 Exponential and Reciprocal instructions, Intel(R) AVX-512 Prefetch instructions for Intel(R) processors, and the instructions enabled with CORE-AVX2. Optimizes for Intel(R) processors that support Intel(R) AVX-512 instructions. KNM May generate Quad Fused Multiply Add (QFMA) and Quad Virtual Neural Network Instruction (QVNNI) and the instructions enabled with MIC-AVX512. Optimizes for Intel(R) Xeon Phi(TM) product family processor code named Knights Mill. CORE-AVX512 May generate Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) Foundation instructions, Intel(R) AVX-512 Conflict Detection instructions, Intel(R) AVX-512 Doubleword and Quadword instructions, Intel(R) AVX-512 Byte and Word instructions and Intel(R) AVX-512 Vector Length Extensions for Intel(R) processors, and the instructions enabled with CORE-AVX2. Optimizes for Intel(R) processors that support Intel(R) AVX-512 instructions. COMMON-AVX512 May generate Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) Foundation instructions, Intel(R) AVX-512 Conflict Detection instructions, as well as the instructions enabled with CORE-AVX2. Optimizes for Intel(R) processors that support Intel(R) AVX-512 instructions. BROADWELL CANNONLAKE HASWELL ICELAKE-CLIENT (or ICELAKE) ICELAKE-SERVER IVYBRIDGE KNL KNM SANDYBRIDGE SILVERMONT GOLDMONT GOLDMONT-PLUS TREMONT SKYLAKE SKYLAKE-AVX512 CASCADELAKE KABYLAKE COFFEELAKE AMBERLAKE WHISKEYLAKE May generate instructions for processors that support the specified Intel(R) microarchitecture code name. Optimizes for Intel(R) processors that support the specified Intel(R) microarchitecture code name. Keywords KNL and SILVERMONT are only available on Windows* and Linux* systems. -xHost generate instructions for the highest instruction set and processor available on the compilation host machine -ax[,,...] generate code specialized for processors specified by while also generating generic IA-32 instructions. includes one or more of the following: SSE2 May generate Intel(R) SSE2 and SSE instructions for Intel processors. SSE3 May generate Intel(R) SSE3, SSE2, and SSE instructions for Intel processors. SSSE3 May generate Intel(R) SSSE3, SSE3, SSE2, and SSE instructions for Intel processors. SSE4.1 May generate Intel(R) SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel processors. SSE4.2 May generate Intel(R) SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel processors. AVX May generate Intel(R) Advanced Vector Extensions (Intel(R) AVX), Intel(R) SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel(R) processors. CORE-AVX2 May generate Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel(R) processors. CORE-AVX-I May generate Intel(R) Advanced Vector Extensions (Intel(R) AVX), including instructions in Intel(R) Core 2(TM) processors in process technology smaller than 32nm, Intel(R) SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel(R) processors. CORE-AVX512 May generate Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) Foundation instructions, Intel(R) AVX-512 Conflict Detection instructions, Intel(R) AVX-512 Doubleword and Quadword instructions, Intel(R) AVX-512 Byte and Word instructions and Intel(R) AVX-512 Vector Length Extensions for Intel(R) processors, and the instructions enabled with CORE-AVX2. BROADWELL CANNONLAKE HASWELL ICELAKE-CLIENT (or ICELAKE) ICELAKE-SERVER IVYBRIDGE KNL KNM SANDYBRIDGE SILVERMONT GOLDMONT GOLDMONT-PLUS TREMONT SKYLAKE SKYLAKE-AVX512 CASCADELAKE KABYLAKE COFFEELAKE AMBERLAKE WHISKEYLAKE May generate instructions for processors that support the specified Intel(R) microarchitecture code name. Optimizes for Intel(R) processors that support the specified Intel(R) microarchitecture code name. Keywords KNL and SILVERMONT are only available on Windows* and Linux* systems. MIC-AVX512 May generate Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) Foundation instructions, Intel(R) AVX-512 Conflict Detection instructions, Intel(R) AVX-512 Exponential and Reciprocal instructions, Intel(R) AVX-512 Prefetch instructions for Intel(R) processors, and the instructions enabled with CORE-AVX2. KNM May generate Quad Fused Multiply Add (QFMA) and Quad Virtual Neural Network Instruction (QVNNI) and the instructions enabled with MIC-AVX512 -mcpu= same as -mtune= -mtune= optimize for a specific generic - Optimizes code for the compiler's default behavior broadwell haswell ivybridge knl knm sandybridge silvermont cannonlake icelake skylake-avx512 skylake - Optimizes code for processors that support the specified Intel(R) microarchitecture code name. knl and silvermont are only available on Windows* and Linux* systems core-avx2 - Optimizes code for processors that support Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2 SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions core-avx-i - Optimizes code for processors that support Float-16 conversion instructions and the RDRND instruction, Intel(R) Advanced Vector Extensions (Intel(R) AVX), Intel(R) SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions corei7-avx - Optimizes code for processors that support Intel(R) Advanced Vector Extensions (Intel(R) AVX), Intel(R) SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions corei7 - Optimizes code for processors that support Intel(R) SSE4 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel(R) SSE4 Vectorizing Compiler and Media Accelerator, Intel(R) SSE3, SSE2, SSE, and SSSE3 instructions atom - Optimizes code for processors that support MOVBE instructions, depending on the setting of option -minstruction (Linux and macOS*) or /Qinstruction (Windows). May also generate code for SSSE3 instructions and Intel(R) SSE3, SSE2, and SSE instructions core2 - Optimizes for the Intel(R) Core(TM) 2 processor family, including support for MMX(TM), Intel(R) SSE, SSE2, SSE3, and SSSE3 instruction sets. pentium-mmx - Optimizes for Intel(R) Pentium(R) with MMX technology pentiumpro - Optimizes for Intel(R) Pentium(R) Pro, Intel Pentium II, and Intel Pentium III processors pentium4m - Optimizes for Intel(R) Pentium(R) 4 processors with MMX technology pentium-m pentium4 pentium3 pentium - Optimizes code for Intel(R) Pentium(R) processors. Value pentium3 is only available on Linux systems -march= generate code exclusively for a given broadwell cannonlake haswell icelake ivybridge knl knm sandybridge silvermont skylake-avx512 skylake - Generates code for processors that support the specified Intel(R) microarchitecture code name. Keywords knl and silvermont are only available on Linux* systems. core-avx2 - Generates code for processors that support Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2 SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions core-avx-i - Generates code for processors that support Float-16 conversion instructions and the RDRND instruction, Intel(R) Advanced Vector Extensions (Intel(R) AVX), Intel(R) SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions corei7-avx - Generates code for processors that support Intel(R) Advanced Vector Extensions (Intel(R) AVX), Intel(R) SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions corei7 - Generates code for processors that support Intel(R) SSE4 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel(R) SSE4 Vectorizing Compiler and Media Accelerator, Intel(R) SSE3, SSE2, SSE, and SSSE3 instructions atom - Generates code for processors that support MOVBE instructions, depending on the setting of option -minstruction (Linux and macOS*) or /Qinstruction (Windows). May also generate code for SSSE3 instructions and Intel(R) SSE3, SSE2, and SSE instructions core2 - Generates for the Intel(R) Core(TM) 2 processor family pentium4m - Generates for Intel(R) Pentium(R) 4 processors with MMX technology pentium-m pentium4 pentium3 pentium - Generates code for Intel(R) Pentium(R) processors. Value pentium3 is only available on Linux systems -msse3 May generate Intel(R) SSE3, SSE2, and SSE instructions -mssse3 May generate Intel(R) SSSE3, SSE3, SSE2, and SSE instructions -msse4 Enable -msse4.2 -msse4.1 May generate Intel(R) SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions -msse4.2 May generate Intel(R) SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions -mavx May generate Intel(R) AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions -masm= generate asm instructions specified by , which may be att (DEFAULT) or intel -minstruction= Refine instruction set output for the selected target processor [no]movbe - Do/do not generate MOVBE instructions with ATOM_SSSE3 (requires -xATOM_SSSE3) -f[no-]omit-frame-pointer enable(DEFAULT)/disable use of EBP as general purpose register. -fno-omit-frame-pointer replaces -fp -f[no-]exceptions enable/disable exception handling table generation The default for C++ is -fexceptions (enabled) The default for C is -fno-exceptions (disabled) -f[no-]fat-lto-objects enable/disable generation of true code/data when generating an IL object using -ipo -c. Objects generated with -ffat-lto-objects or -fno-fat-lto-objects are added unmodified to an archive when using xiar. xiar behavior remains unchanged for an IL object generated without specifying -f[no-]fat-lto-objects. -fnon-call-exceptions enable/disable(DEFAULT) code that allows exceptions from trapping instructions to be caught -regcall make __regcall the default calling convention -hotpatch[=n] generate padding bytes for function entries to enable image hotpatching. If specified, use 'n' as the padding. -fasynchronous-unwind-tables determines whether unwind information is precise at an instruction boundary or at a call boundary. -fno-asynchronous-unwind-tables is the default for IA-32 architecture. -fextend-arguments=[32|64] By default, unprototyped scalar integer arguments are passed in 32-bits (sign-extended if necessary). On Intel(R) 64, unprototyped scalar integer arguments may be extended to 64-bits. -m32 generate code for IA-32 architecture -m64 generate code for Intel(R) 64 architecture -m[no-]omit-leaf_frame-pointer Specifies the number of registers to use when passing integer arguments -m80387 Determine which version of the ABI is used for the parameter passing convention 0 - use the most recent implementation (DEFAULT) 1 - use the implementation compatible with gcc 3.4.6 -mx87 generate code for IA-32 architecture -mstringop-strategy= generate code for Intel(R) 64 architecture -mstringop-inline-threshold= determines wheather the frame pointer is omitted or kept in leaf functions -fcf-protection[=] Same as -m80387 -mauto-arch=[,,...] Override the internal decision heuristic for the particular algorithm to use for inlining string operations. The allowed values for : rep - Expand using i386 "rep" prefix (DEFAULT for -Os) const_size_loop - Expand into an inline loop when size is known at compile time (DEFAULT) libcall - Always use a library call. Interprocedural Optimization (IPO) ---------------------------------- -[no-]ip enable(DEFAULT)/disable single-file IP optimization within files -ipo[n] enable multi-file IP optimization between files -ipo-c generate a multi-file object file (ipo_out.o) -ipo-S generate a multi-file assembly file (ipo_out.S) -ip-no-inlining disable full and partial inlining -ip-no-pinlining disable partial inlining -ipo-separate create one object file for every source file (overrides -ipo[n]) -ipo-jobs specify the number of jobs to be executed simultaneously during the IPO link phase Advanced Optimizations ---------------------- -unroll[n] set maximum number of times to unroll loops. Omit n to use default heuristics. Use n=0 to disable the loop unroller -[no-]unroll-aggressive enables more aggressive unrolling heuristics -funroll-loops unroll loops based on default heuristics -[no-]scalar-rep enable(DEFAULT)/disable scalar replacement (requires -O3) -[no-]ansi-alias enable(DEFAULT)/disable use of ANSI aliasing rules optimizations; user asserts that the program adheres to these rules -[no-]ansi-alias-check enable(DEFAULT)/disable ANSI alias checking when using -ansi-alias -[no-]complex-limited-range enable/disable(DEFAULT) the use of the basic algebraic expansions of some complex arithmetic operations. This can allow for some performance improvement in programs which use a lot of complex arithmetic at the loss of some exponent range. -[no-]alias-const enable/disable(DEFAULT) a heuristic stating that if two arguments to a function have pointer type, a pointer to const does not alias a pointer to non-const. Also known as the input/output buffer rule, it assumes that input and output buffer arguments do not overlap. -fargument-alias arguments may alias each other and may alias global storage -fargument-noalias arguments do not alias each other but may alias global storage -fargument-noalias-global arguments do not alias each other and do not alias global storage -ftls-model= change thread-local storage model, where can be the following: global-dynamic, local-dynamic, initial-exec or local-exec -q[no-]opt-multi-version-aggressive enables more aggressive multi-versioning to check for pointer aliasing and scalar replacement -qopt-ra-region-strategy[=] select the method that the register allocator uses to partition each routine into regions routine - one region per routine block - one region per block trace - one region per trace loop - one region per loop default - compiler selects best option -[no-]vec enables(DEFAULT)/disables vectorization -[no-]vec-guard-write enables cache/bandwidth optimization for stores under conditionals within vector loops -vec-threshold[n] sets a threshold for the vectorization of loops based on the probability of profitable execution of the vectorized loop in parallel -vecabi= select vector function ABI legacy - use the legacy vector function ABI compat - use the compatibility vector function ABI (DEFAULT) cmdtarget - generate an extended set of vector functions gcc - use GCC compatible ABI -qopt-malloc-options={0|1|2|3|4} specify malloc configuration parameters. Specifying a non-zero value will cause alternate configuration parameters to be set for how malloc allocates and frees memory -qopt-calloc enable/disable(DEFAULT) calls to fast calloc function -qopt-jump-tables= control the generation of jump tables default - let the compiler decide when a jump table, a series of if-then-else constructs or a combination is generated large - generate jump tables up to a certain pre-defined size (64K entries) - generate jump tables up to in size use -qno-opt-jump-tables to lower switch statements as chains of if-then-else constructs -fno-jump-tables do not generate jump tables for switches and if-then-else statements -qopt-block-factor= specify blocking factor for loop blocking -ffreestanding compile in a freestanding environment where the standard library may not be present -qopt-streaming-stores= specifies whether streaming stores are generated always - enables generation of streaming stores under the assumption that the application is memory bound. Also, the user is responsible for inserting the right memory fences for synchronization auto - compiler decides when streaming stores are used (DEFAULT) never - disables generation of streaming stores -ipp[=] link some or all of the Intel(R) Integrated Performance Primitives (Intel(R) IPP) libraries and bring in the associated headers common - link using the main libraries set. This is the default value when -ipp is specified crypto - link using the main libraries set and the crypto library -ipp-link= choose whether to link with static or dynamic libraries to support Intel(R) Integrated Performance Primitives (Intel(R) IPP) dynamic - link using the dynamic libraries set. This is the default value when -ipp is specified on Windows static - link using the static libraries set. This is the default value when -ipp is specified on Linux nonpic - link using the version of the libraries that do not have position independent code nonpic_crypto - link using the crypto library and the version of the libraries that do not have position independent code -mkl[=] link to the Intel(R) Math Kernel Library (Intel(R) MKL) and bring in the associated headers parallel - link using the threaded Intel(R) MKL libraries. This is the default when -mkl is specified sequential - link using the non-threaded Intel(R) MKL libraries cluster - link using the Intel(R) MKL Cluster libraries plus the sequential Intel(R) MKL libraries -tbb link to the Intel(R) Threading Building Blocks (Intel(R) TBB) libraries and bring in the associated headers -daal[=] link to the Intel(R) Data Analytics Acceleration Library (Intel(R) DAAL) libraries and bring in the associated headers parallel - link using the threaded Intel(R) DAAL (DEFAULT) sequential - link using the non-threaded Intel(R) DAAL -q[no-]opt-subscript-in-range link to the Intel(R) Data Analytics Acceleration Library (Intel(R) DAAL) libraries and bring in the associated headers parallel - link using the threaded Intel(R) DAAL (DEFAULT) sequential - link using the non-threaded Intel(R) DAAL -[no-]use-intel-optimized-headers assumes no overflows in the intermediate computation of the subscripts -[no-]intel-extensions enable(DEFAULT)/disable C/C++ language extensions such as array notation, Intel(R) Cilk(TM) Plus language extensions, and support for decimal floating-point types. -q[no-]opt-matmul replace matrix multiplication with calls to intrinsics and threading libraries for improved performance (DEFAULT at -O3 -parallel) -[no-]simd enables(DEFAULT)/disables vectorization using simd pragma -[no-]simd-function-pointers enables(DEFAULT)/disables vectorization using simd pragma -guide-opts= enables/disables(DEFAULT) pointers to simd-enabled functions -guide-file[=] tells the compiler to analyze certain code and generate recommendations that may improve optimizations -guide-file-append[=] causes the results of guide to be output to a file -guide[=] causes the results of guide to be appended to a file -guide-data-trans[=] lets you set a level (1 - 4) of guidance for auto-vectorization, auto-parallelization, and data transformation (DEFAULT is 4 when the option is specified) -guide-par[=] lets you set a level (1 - 4) of guidance for data transformation (DEFAULT is 4 when the option is specified) -guide-vec[=] lets you set a level (1 - 4) of guidance for auto-parallelization (DEFAULT is 4 when the option is specified) -qopt-mem-layout-trans[=] specify a loop profiler data file (or set of files in a directory) when using the -guide option -qopt-prefetch[=n] controls the level of memory layout transformations performed by the compiler 0 - disable memory layout transformations (same as -qno-opt-mem-layout-trans) 1 - enable basic memory layout transformations 2 - enable more memory layout transformations (DEFAULT when the option is specified) 3 - enable aggressive memory layout transformations -qno-opt-prefetch enable levels of prefetch insertion, where 0 disables. n may be 0 through 5 inclusive. Default is 2. -qopt-prefetch-distance=n1[,n2] disable(DEFAULT) prefetch insertion. Equivalent to -qopt-prefetch=0 -qopt-prefetch-issue-excl-hint specify the prefetch distance (how many iterations ahead, use n1 and n2 values such that n1>=n2) to be used for compiler generated prefetches inside loops. n1 indicates distance from memory to L2 cache and n2 indicates distance from L2 to L1. -qopt-threads-per-core=n generates PrefetchW instruction for Intel(R) microarchitecture code name Broadwell processors and beyond when -qopt-prefetch is also used -qopt-streaming-cache-evict=n specifies the number of threads (1 - 4) per core to be used for an application (Intel(R) MIC Architecture specific) -qopt-gather-scatter-unroll=n specifies the cache line eviction level (0 - 3) when streaming loads/stores are used. (Intel(R) MIC Architecture specific) -qopt-dynamic-align specify an alternative loop unroll sequence for gather and scatter loops (Intel(R) MIC Architecture specific). Disable with -qno-opt-gather-scatter-unroll (equivalent to n=0) -falign-loops[=n] enable(DEFAULT) dynamic data alignment optimizations. Specify -qno-opt-dynamic-align to disable -qopt-zmm-usage= specify code alignment of loops to improve performance. n is the number of bytes for the minimum alignment boundary. It must be a power of 2 between 1 and 4096. If n is not present, an alignment of 16 bytes is used. Use of -fno-align-loops (DEFAULT) sets alignment to 1. -qoverride-limits Specifies the level of zmm registers usage. You can specify one of the following: low - Tells the compiler that the compiled program is unlikely to benefit from zmm registers usage. It specifies that the compiler should avoid using zmm registers unless it can prove the gain from their usage. high - Tells the compiler to generate zmm code without restrictions -m[no-]branches-within-32B-boundaries provides a way to override certain internal compiler limits that are intended to prevent excessive memory usage or compile times for very large, complex compilation units. -q[no-]opt-multiple-gather-scatter-by-shuffles Enables or disables the optimization for multiple adjacent gather/scatter type vector memory references. Profile Guided Optimization (PGO) --------------------------------- -prof-dir specify directory for profiling output files (*.dyn and *.dpi) -prof-src-root specify project root directory for application source files to enable relative path resolution during profile feedback on sources below that directory -prof-src-root-cwd specify the current directory as the project root directory for application source files to enable relative path resolution during profile feedback on sources below that directory -[no-]prof-src-dir specify whether directory names of sources should be considered when looking up profile records within the .dpi file -prof-file specify file name for profiling summary file -[no-]prof-data-order enable/disable(DEFAULT) static data ordering with profiling -[no-]prof-func-order enable/disable(DEFAULT) function ordering with profiling -[no-]prof-func-groups enable(DEFAULT with PGO)/disable function grouping -prof-gen[=keyword[,keyword]] instrument program for profiling. Optional keywords are as follows. default - Produces an instrumented object file. This is the same as specifying the -prof-gen option with no keyword. srcpos - Produces an instrumented object file and information needed for using the code coverage tool. globdata - Produces an instrumented object file that includes information for global data layout. threadsafe - Collects PGO data with guards for threaded applications. -no-prof-gen disable profiling instrumentation -prof-use[=] enable use of profiling information during optimization weighted - invokes profmerge with -weighted option to scale data based on run durations [no]merge - enable(default)/disable the invocation of the profmerge tool -no-prof-use disable use of profiling information during optimization -fnsplit[=] enable function splitting (enabled with /Qprof-use for IA-32 Windows) n - positive integer indicating the threshold number. The blocks can be placed into a different code segment if their execution probability is less than the specified value of range 0 <= n <= 100 use -no-fnsplit to disable -p compile and link for function profiling with UNIX gprof tool On IA32 and Intel(r)64, -pg is also valid -f[no-]instrument-functions determine whether function entry and exit points are instrumented -prof-hotness-threshold= set the hotness threshold for function grouping and function ordering val indicates percentage of functions to be placed in hot region. This option requires -prof-use and -prof-func-groups or -prof-func-order -prof-value-profiling=[,,...] limit value profiling none - inhibit all types of value profiling nodivide - inhibit value profiling of non-compile time constants used in division or remainder operations noindcall - inhibit value profiling of function addresses at indirect call sites -prof-gen-sampling prepares application executables for hardware profiling (sampling) and causes the compiler to generate source code mapping information -prof-use-sampling=file[:file:...] enable use of hardware profiling (sampling) information during optimization. Argument provides list of one or more profiling data files to apply Optimization Reports -------------------- -qopt-report[=n] generate an optimization report. Default destination is .optrpt. Levels of 0 - 5 are valid. Please see documentation for additional details of information provided by phase per level. 0 disable optimization report output 2 DEFAULT when enabled -qopt-report-file=[stdout | stderr | ] specify the filename or output stream for the generated report -qopt-report-stdout specify the generated report should be directed to stdout -qopt-report-per-object specify the generated report should be directed to a .optrpt file in the output directory (DEFAULT when another destination for the report is not specified) -qopt-report-phase=[,,...] specify one or more phases that reports are generated against -qopt-report-routine=[,,...] restrict the report to routines containing the given name -qopt-report-filter= restricts the opt-report to specific files, routines or line number ranges. Refer to the documentation for the specific syntax of parameter string. -qopt-report-format=[text|vs] specify the output format to be used for the opt-report as either plain text or a format for use in the Microsoft* Visual Studio IDE -q[no-]opt-report-embed When enabled, if an assembly file is being generated, special loop info annotations will be emitted in the assembly file. If an object file/executable is being generated, these will be emitted into the object file/executable for use by the Intel VTune Amplifier application. Automatically enabled when symbolic debug information is enabled. -qopt-report-help display the optimization phases available for reporting -qopt-report-names= Specifies whether mangled or unmangled names should appear in the optimization report. mangled - use mangled names unmangled - use unmangled names (DEFAULT) -qopt-report-annotate[=] Annotate source files with optimization reports in specified format html - annotate in HTML format text - annotate in text format (DEFAULT) -qopt-report-annotate-position= Specify the site where loop related optimization reports appear in the annotated source for inlined routines caller - annotate at caller site callee - annotate at callee site both - annotate at both caller and callee site -tcheck [mode] enable analysis of threaded applications (requires Intel(R) Thread Checker; cannot be used with compiler alone) tci - instruments a program to perform a thread-count-independent analysis tcd - instruments a program to perform a thread-count-dependent analysis (DEFAULT when mode is not used) api - instruments a program at the api-imports level -tcollect[=] inserts instrumentation probes calling the Intel(R) Trace Collector API. The library -l is linked in the default being -lVT (requires Intel(R) Trace Collector) -tcollect-filter file Enable or disable the instrumentation of specified functions. (requires Intel(R) Trace Collector)