Intel(R) Advisor can now assist with vectorization and show optimization report messages with your source code. See "https://software.intel.com/en-us/intel-advisor-xe" for details. Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.2.174 Build 20170213 Compiler options: -I../../src/extra -I. -I../../src/include -I../../src/include -I/usr/local/include -I../../src/nmath -I/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl/include -DHAVE_CONFIG_H -fopenmp -fpic -g -qopt-report=5 -qopt-report-annotate=html -qopt-report-phase=all -fp-model precise -O3 -no-ipo -xHost -c -o radixsort.o Report from: Interprocedural optimizations [ipo] WHOLE PROGRAM (SAFE) [EITHER METHOD]: false WHOLE PROGRAM (SEEN) [TABLE METHOD]: false WHOLE PROGRAM (READ) [OBJECT READER METHOD]: false INLINING OPTION VALUES: -inline-factor: 100 -inline-min-size: 30 -inline-max-size: 230 -inline-max-total-size: 2000 -inline-max-per-routine: 10000 -inline-max-per-compile: 500000 In the inlining report below: "sz" refers to the "size" of the routine. The smaller a routine's size, the more likely it is to be inlined. "isz" refers to the "inlined size" of the routine. This is the amount the calling routine will grow if the called routine is inlined into it. The compiler generally limits the amount a routine can grow by having routines inlined into it. Begin optimization report for: savetl(SEXP) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (savetl(SEXP)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(99,1) =========================================================================== Begin optimization report for: push(int) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (push(int)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(143,1) =========================================================================== Begin optimization report for: mpush(int, int) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (mpush(int, int)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(154,1) =========================================================================== Begin optimization report for: flipflop() Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (flipflop()) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(166,1) =========================================================================== Begin optimization report for: growstack(int) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (growstack(int)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(130,1) =========================================================================== Begin optimization report for: gsfree() Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (gsfree()) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(175,1) =========================================================================== Begin optimization report for: setRange(int *, int) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (setRange(int *, int)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(203,1) =========================================================================== Begin optimization report for: icheck(int) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (icheck(int)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(241,1) =========================================================================== Begin optimization report for: alloc_otmp(int) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (alloc_otmp(int)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(399,1) =========================================================================== Begin optimization report for: alloc_xtmp(int) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (alloc_xtmp(int)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(415,1) =========================================================================== Begin optimization report for: setNumericRounding(int) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (setNumericRounding(int)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(620,1) =========================================================================== Begin optimization report for: dtwiddle(void *, int, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (dtwiddle(void *, int, int)) [12/37=32.4%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(632,1) -> EXTERN: (634,9) __finite(double) -> EXTERN: (636,16) __isnan(double) Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(632,1):remark #34051: REGISTER ALLOCATION : [dtwiddle] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:632 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 7[ rax rdx rcx rsi rdi zmm0-zmm1] Routine temporaries Total : 34 Global : 9 Local : 25 Regenerable : 5 Spilled : 1 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 8 bytes* Reads : 2 [1.00e+00 ~ 2.7%] Writes : 1 [1.00e+00 ~ 2.7%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: dnan(void *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (dnan(void *, int)) [13/37=35.1%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(648,1) -> EXTERN: (650,13) __isnan(double) Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(648,1):remark #34051: REGISTER ALLOCATION : [dnan] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:648 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 5[ rax rdx rsi rdi zmm0] Routine temporaries Total : 16 Global : 0 Local : 16 Regenerable : 1 Spilled : 0 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: StrCmp2(SEXP, SEXP) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (StrCmp2(SEXP, SEXP)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(906,1) =========================================================================== Begin optimization report for: StrCmp(SEXP, SEXP) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (StrCmp(SEXP, SEXP)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(917,1) =========================================================================== Begin optimization report for: alloc_csort_otmp(int) Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (alloc_csort_otmp(int)) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1118,1) =========================================================================== Begin optimization report for: do_radixsort(SEXP, SEXP, SEXP, SEXP) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (do_radixsort(SEXP, SEXP, SEXP, SEXP)) [17/37=45.9%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1542,1) -> EXTERN: (1557,15) Rf_asLogical(SEXP) -> EXTERN: (1558,3) Rf_asLogical(SEXP) -> EXTERN: (1564,14) Rf_asLogical(SEXP) -> EXTERN: (1571,15) Rf_asLogical(SEXP) -> INLINE: (1575,5) setNumericRounding(int) (isz = 9) (sz = 14) -> EXTERN: (1579,9) Rf_isVector(SEXP) -> EXTERN: (1582,7) Rf_isVector(SEXP) -> EXTERN: (1583,6) Rf_error(const char *, ...) -> EXTERN: (1583,12) dcgettext(const char *, const char *, int) -> EXTERN: (1586,6) Rf_error(const char *, ...) -> EXTERN: (1586,12) dcgettext(const char *, const char *, int) -> EXTERN: (1589,17) Rf_length(SEXP) -> EXTERN: (1590,2) Rf_error(const char *, ...) -> EXTERN: (1590,8) dcgettext(const char *, const char *, int) -> EXTERN: (1593,6) Rf_error(const char *, ...) -> EXTERN: (1593,12) dcgettext(const char *, const char *, int) -> EXTERN: (1595,13) Rf_asLogical(SEXP) -> EXTERN: (1602,2) Rf_error(const char *, ...) -> EXTERN: (1602,8) dcgettext(const char *, const char *, int) -> EXTERN: (1618,16) Rf_protect(SEXP) -> EXTERN: (1618,16) Rf_allocVector(SEXPTYPE, R_xlen_t) -> (1626,9) checkEncodings(SEXP) (isz = 60) (sz = 65) [[ Inlining inhibited by overrideable criterion <1>]] -> INLINE: (1629,5) savetl_init() (isz = 43) (sz = 46) -> EXTERN: (68,2) Rf_error(const char *, ...) -> EXTERN: (72,23) malloc(size_t) -> EXTERN: (74,2) Rf_error(const char *, ...) -> EXTERN: (75,27) malloc(size_t) -> EXTERN: (77,2) free(void *) -> EXTERN: (78,2) Rf_error(const char *, ...) -> (1634,8) isorted(int *, int) (isz = 670) (sz = 685) [[ Inlining would exceed -inline-max-size value (685>230) <2>]] -> (1639,8) dsorted(double *, int) (isz = 737) (sz = 752) [[ Inlining would exceed -inline-max-size value (752>230) <2>]] -> (1642,8) csorted(SEXP *, int) (isz = 634) (sz = 649) [[ Inlining would exceed -inline-max-size value (649>230) <2>]] -> EXTERN: (1645,9) Rf_error(const char *, ...) -> EXTERN: (1645,9) Rf_type2char(SEXPTYPE) -> (1645,9) savetl_end() (isz = 41) (sz = 44) [[ Inlining inhibited by overrideable criterion <1>]] -> (1673,6) isort(int *, int *, int) (isz = 458) (sz = 468) [[ Inlining would exceed -inline-max-size value (468>253) <2>]] -> (1676,6) dsort(double *, int *, int) (isz = 355) (sz = 365) [[ Inlining would exceed -inline-max-size value (365>253) <2>]] -> (1680,3) csort_pre(SEXP *, int) (isz = 476) (sz = 484) [[ Inlining would exceed -inline-max-size value (484>230) <2>]] -> INLINE: (1681,3) alloc_csort_otmp(int) (isz = 18) (sz = 24) -> EXTERN: (1121,26) realloc(void *, size_t) -> (1123,2) savetl_end() (isz = 41) (sz = 44) [[ Inlining inhibited by overrideable criterion <1>]] -> EXTERN: (1123,2) Rf_error(const char *, ...) -> (1682,3) csort(SEXP *, int *, int) (isz = 473) (sz = 483) [[ Inlining would exceed -inline-max-size value (483>253) <2>]] -> (1684,3) cgroup(SEXP *, int *, int) (isz = 636) (sz = 645) [[ Inlining would exceed -inline-max-size value (645>253) <2>]] -> EXTERN: (1687,6) Rf_error(const char *, ...) -> (1687,6) savetl_end() (isz = 41) (sz = 44) [[ Inlining inhibited by overrideable criterion <1>]] -> EXTERN: (1699,25) malloc(size_t) -> EXTERN: (1701,13) Rf_error(const char *, ...) -> (1701,13) savetl_end() (isz = 41) (sz = 44) [[ Inlining inhibited by overrideable criterion <1>]] -> EXTERN: (1704,24) malloc(size_t) -> EXTERN: (1706,13) Rf_error(const char *, ...) -> (1706,13) savetl_end() (isz = 41) (sz = 44) [[ Inlining inhibited by overrideable criterion <1>]] -> INLINE: (1717,2) flipflop() (isz = 18) (sz = 21) -> INLINE: (171,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> (137,2) savetl_end() (isz = 41) (sz = 44) [[ Inlining inhibited by overrideable criterion <1>]] -> EXTERN: (137,2) Rf_error(const char *, ...) -> (1735,3) csort_pre(SEXP *, int) (isz = 476) (sz = 484) [[ Inlining would exceed -inline-max-size value (484>230) <2>]] -> INLINE: (1736,3) alloc_csort_otmp(int) (isz = 18) (sz = 24) -> EXTERN: (1121,26) realloc(void *, size_t) -> (1123,2) savetl_end() (isz = 41) (sz = 44) [[ Inlining inhibited by overrideable criterion <1>]] -> EXTERN: (1123,2) Rf_error(const char *, ...) -> EXTERN: (1745,6) Rf_error(const char *, ...) -> EXTERN: (1745,6) Rf_type2char(SEXPTYPE) -> (1745,6) savetl_end() (isz = 41) (sz = 44) [[ Inlining inhibited by overrideable criterion <1>]] -> EXTERN: (1770,8) __isnan(double) -> EXTERN: (1781,25) Rf_error(const char *, ...) -> (1781,25) savetl_end() (isz = 41) (sz = 44) [[ Inlining inhibited by overrideable criterion <1>]] -> CP_CLONE (1785,17) push..0(int) (isz = 92) (sz = 98) [[ Inlining inhibited by overrideable criterion <1>]] -> INDIRECT: (1807,19) f.2972_V$196.0.36 [[ Callee not marked with inlining pragma <3>]] -> INDIRECT: (1834,6) g.2972_V$197.0.36 [[ Callee not marked with inlining pragma <3>]] -> EXTERN: (1846,3) memcpy(void *__restrict__, const void *__restrict__, size_t) -> EXTERN: (1852,9) Rf_error(const char *, ...) -> INLINE: (1852,9) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1858,5) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (1859,5) free(void *) -> EXTERN: (1866,23) Rf_install(const char *) -> EXTERN: (1867,9) Rf_setAttrib(SEXP, SEXP, SEXP) -> EXTERN: (1867,36) Rf_allocVector(SEXPTYPE, R_xlen_t) -> EXTERN: (1874,26) Rf_install(const char *) -> EXTERN: (1875,9) Rf_setAttrib(SEXP, SEXP, SEXP) -> EXTERN: (1875,35) Rf_ScalarInteger(int) -> EXTERN: (1877,9) Rf_allocVector(SEXPTYPE, R_xlen_t) -> EXTERN: (1877,9) Rf_protect(SEXP) -> EXTERN: (1878,9) SET_STRING_ELT(SEXP, R_xlen_t, SEXP) -> EXTERN: (1878,32) Rf_mkChar(const char *) -> EXTERN: (1879,9) SET_STRING_ELT(SEXP, R_xlen_t, SEXP) -> EXTERN: (1879,32) Rf_mkChar(const char *) -> EXTERN: (1880,9) Rf_setAttrib(SEXP, SEXP, SEXP) -> EXTERN: (1881,9) Rf_unprotect(int) -> EXTERN: (1892,13) Rf_protect(SEXP) -> EXTERN: (1892,13) Rf_allocVector(SEXPTYPE, R_xlen_t) -> EXTERN: (1898,13) Rf_unprotect(int) -> INLINE: (1902,5) gsfree() (isz = 26) (sz = 29) -> EXTERN: (176,5) free(void *) -> EXTERN: (177,5) free(void *) -> EXTERN: (1903,5) free(void *) -> EXTERN: (1904,5) free(void *) -> EXTERN: (1904,17) free(void *) -> EXTERN: (1905,5) free(void *) -> EXTERN: (1906,5) free(void *) -> EXTERN: (1907,5) free(void *) -> EXTERN: (1909,5) free(void *) -> EXTERN: (1910,5) free(void *) -> EXTERN: (1913,5) Rf_unprotect(int) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1581,5) remark #15523: loop was not vectorized: loop control variable narg was found, but loop iteration count cannot be computed before executing the loop LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1591,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1593,6) ] remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1591,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1653,6) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1653,6) remark #15389: vectorization support: reference ans[i+10] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1654,3) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 7.180 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1653,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1659,6) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1659,6) remark #15389: vectorization support: reference ans[i+10] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1660,3) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 7.180 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1659,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1665,6) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1665,6) remark #15389: vectorization support: reference ans[i+10] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1666,3) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.600 remark #15300: LOOP WAS VECTORIZED remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.620 remark #15478: estimated potential speedup: 4.000 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1665,6) remark #25015: Estimate of max trip count of loop=24 LOOP END LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1710,5) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1749,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1795,17) remark #25456: Number of Array Refs Scalar Replaced In Loop: 1 remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1795,17) remark #15389: vectorization support: reference xsub[j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1796,31) ] remark #15389: vectorization support: reference ans[i+10] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1796,56) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15328: vectorization support: irregularly indexed load was emulated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1796,52) ] remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.246 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15462: unmasked indexed (or gather) loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 7.120 remark #15478: estimated potential speedup: 1.520 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1795,17) remark #25456: Number of Array Refs Scalar Replaced In Loop: 1 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1798,17) remark #25456: Number of Array Refs Scalar Replaced In Loop: 1 remark #25015: Estimate of max trip count of loop=15 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1798,17) remark #15388: vectorization support: reference xsub[j] has aligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1799,33) ] remark #15389: vectorization support: reference ans[i+10] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1799,60) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15328: vectorization support: irregularly indexed load was emulated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1799,56) ] remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 0.124 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15462: unmasked indexed (or gather) loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 7.060 remark #15478: estimated potential speedup: 1.510 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1798,17) remark #25456: Number of Array Refs Scalar Replaced In Loop: 1 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1801,17) remark #25456: Number of Array Refs Scalar Replaced In Loop: 1 remark #25015: Estimate of max trip count of loop=15 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1801,17) remark #15389: vectorization support: reference xsub[j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1802,30) ] remark #15389: vectorization support: reference ans[i+10] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1802,54) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15415: vectorization support: irregularly indexed load was generated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1802,50) ] remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 0.133 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15462: unmasked indexed (or gather) loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 6.560 remark #15478: estimated potential speedup: 1.620 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1801,17) remark #25456: Number of Array Refs Scalar Replaced In Loop: 1 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1812,7) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between osub[k] (1818:4) and osub[thisgrpn-1-k] (1820:4) remark #15346: vector dependence: assumed FLOW dependence between osub[thisgrpn-1-k] (1820:4) and osub[k] (1818:4) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1812,7) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1825,7) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1825,7) remark #15389: vectorization support: reference osub[k] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1825,42) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.600 remark #15300: LOOP WAS VECTORIZED remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.620 remark #15478: estimated potential speedup: 4.000 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1825,7) remark #25015: Estimate of max trip count of loop=24 LOOP END LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1838,7) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1838,7) remark #15389: vectorization support: reference xsub[j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1840,13) ] remark #15389: vectorization support: reference newo[j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1840,29) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15415: vectorization support: irregularly indexed load was generated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1840,24) ] remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.271 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15462: unmasked indexed (or gather) loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 9 remark #15477: vector cost: 6.000 remark #15478: estimated potential speedup: 1.460 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1838,7) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1842,7) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1842,7) remark #15389: vectorization support: reference xsub[j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1844,13) ] remark #15389: vectorization support: reference newo[j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1845,13) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15415: vectorization support: irregularly indexed load was generated for the variable , masked, part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1845,8) ] remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.228 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15458: masked indexed (or gather) loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 16 remark #15477: vector cost: 7.120 remark #15478: estimated potential speedup: 2.120 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1842,7) LOOP END LOOP END LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1852,9) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1852,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1854,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed OUTPUT dependence between (sl__x__+?)->lv_truelength (1855:9) and sl__x__->truelength (1855:9) remark #15346: vector dependence: assumed OUTPUT dependence between sl__x__->truelength (1855:9) and (sl__x__+?)->lv_truelength (1855:9) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1854,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1858,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1858,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1870,13) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between x[i+10] (1871:17) and x[i-1+10] (1871:17) remark #25439: unrolled with remainder by 2 remark #25456: Number of Array Refs Scalar Replaced In Loop: 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1870,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1887,9) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1887,9) remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.750 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 1.250 remark #15478: estimated potential speedup: 6.260 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1887,9) remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.455 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1887,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1894,13) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between ans[i2+10] (1896:21) and ans[i+10] (1896:21) remark #15346: vector dependence: assumed ANTI dependence between ans[i+10] (1896:21) and ans[i2+10] (1896:21) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1894,13) LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1846,3):remark #34014: optimization advice for memcpy: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1846,3):remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1846,3):remark #34026: call to memcpy implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1666,3):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1666,3):remark #34026: call to memset implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1825,42):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1825,42):remark #34026: call to memset implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1542,1):remark #34051: REGISTER ALLOCATION : [do_radixsort] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:1542 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 30[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm15] Routine temporaries Total : 868 Global : 303 Local : 565 Regenerable : 136 Spilled : 38 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 256 bytes* Reads : 136 [3.85e+00 ~ 2.3%] Writes : 57 [4.96e+00 ~ 3.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: csort_pre(SEXP *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (csort_pre(SEXP *, int)) [18/37=48.6%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1188,1) -> EXTERN: (1197,6) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1203,6) R_BadLongVector(SEXP, const char *, int) -> INLINE: (1204,6) savetl(SEXP) (isz = 133) (sz = 138) -> EXTERN: (103,17) realloc(void *, size_t) -> INLINE: (105,6) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (106,6) Rf_error(const char *, ...) -> EXTERN: (109,17) realloc(void *, size_t) -> INLINE: (111,6) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (112,6) Rf_error(const char *, ...) -> EXTERN: (117,23) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1213,13) realloc(void *, size_t) -> EXTERN: (1215,3) Rf_error(const char *, ...) -> INLINE: (1215,3) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (1222,24) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1223,15) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1237,25) realloc(void *, size_t) -> EXTERN: (1240,6) Rf_error(const char *, ...) -> INLINE: (1240,6) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (1241,2) memset(void *, int, size_t) -> EXTERN: (1244,32) realloc(void *, size_t) -> EXTERN: (1248,13) Rf_error(const char *, ...) -> INLINE: (1248,13) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> (1253,5) cradix_r(SEXP *, int, int) (isz = 335) (sz = 348) [[ Inlining would exceed -inline-max-size value (348>230) <2>]] Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1204,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1204,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1204,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1204,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1215,3) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed OUTPUT dependence between (sl__x__+?)->lv_truelength (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed OUTPUT dependence between sl__x__->truelength (89:2) and (sl__x__+?)->lv_truelength (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1215,3) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1240,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1240,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1248,13) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1248,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1254,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed OUTPUT dependence between (sl__x__+?)->lv_truelength (1255:2) and sl__x__->truelength (1255:2) remark #15346: vector dependence: assumed OUTPUT dependence between sl__x__->truelength (1255:2) and (sl__x__+?)->lv_truelength (1255:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1254,5) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1193,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1241,2):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1241,2):remark #34026: call to memset implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1188,1):remark #34051: REGISTER ALLOCATION : [csort_pre] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:1188 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 15[ rax rdx rcx rbx rbp rsi rdi r8-r15] Routine temporaries Total : 221 Global : 112 Local : 109 Regenerable : 37 Spilled : 17 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 88 bytes* Reads : 22 [1.92e+01 ~ 7.5%] Writes : 16 [1.10e+01 ~ 4.3%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: cradix_r(SEXP *, int, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (cradix_r(SEXP *, int, int)) [19/37=51.4%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(974,1) -> INLINE: (982,6) StrCmp(SEXP, SEXP) (isz = 12) (sz = 24) -> EXTERN: (923,12) strcmp(const char *, const char *) -> EXTERN: (996,19) R_BadLongVector(SEXP, const char *, int) -> (1004,2) cradix_r(SEXP *, int, int) (isz = 335) (sz = 348) [[ Callee not marked with inlining pragma <3>]] -> EXTERN: (1015,19) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1020,5) memcpy(void *__restrict__, const void *__restrict__, size_t) -> EXTERN: (1022,2) memset(void *, int, size_t) -> EXTERN: (1026,2) Rf_error(const char *, ...) -> INLINE: (1026,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> (1033,2) cradix_r(SEXP *, int, int) (isz = 335) (sz = 348) [[ Callee not marked with inlining pragma <3>]] -> (1041,2) cradix_r(SEXP *, int, int) (isz = 335) (sz = 348) [[ Callee not marked with inlining pragma <3>]] Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(994,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(996,19) ] remark #25439: unrolled with remainder by 14 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(994,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1009,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between itmp (1012:23) and itmp (1012:23) remark #15346: vector dependence: assumed FLOW dependence between itmp (1012:23) and itmp (1012:23) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1009,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1013,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1015,19) ] remark #25439: unrolled with remainder by 12 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1013,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1029,5) remark #15382: vectorization support: call to function cradix_r(SEXP *, int, int) cannot be vectorized [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1033,2) ] remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1029,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1026,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1026,2) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1041,2) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1020,5):remark #34014: optimization advice for memcpy: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1020,5):remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1020,5):remark #34026: call to memcpy implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1022,2):remark #34014: optimization advice for memset: increase the destination's alignment to 32 (and use __assume_aligned) to increase the width of stores /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1022,2):remark #34000: call to memset implemented inline with stores with proven (alignment, offset): (1, 0) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(974,1):remark #34051: REGISTER ALLOCATION : [cradix_r] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:974 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 16[ rax rdx rcx rbx rbp rsi rdi r8-r15 zmm0] Routine temporaries Total : 405 Global : 154 Local : 251 Regenerable : 95 Spilled : 10 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 32 bytes* Reads : 7 [3.02e+00 ~ 0.1%] Writes : 5 [2.18e+00 ~ 0.1%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: push..0(int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (push..0(int)) [20/37=54.1%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(143,1) CLONED FROM: push(int)(1) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(147,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(147,2) LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(143,1):remark #34051: REGISTER ALLOCATION : [push..0] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:143 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 13[ rax rdx rcx rbx rbp rsi rdi r8-r10 r12 r14-r15] Routine temporaries Total : 58 Global : 28 Local : 30 Regenerable : 11 Spilled : 5 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: savetl_end() Report from: Interprocedural optimizations [ipo] INLINE REPORT: (savetl_end()) [21/37=56.8%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(83,1) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(83,1):remark #34051: REGISTER ALLOCATION : [savetl_end] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:83 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 8[ rax rdx rcx rbx rsi rdi r8-r9] Routine temporaries Total : 28 Global : 20 Local : 8 Regenerable : 2 Spilled : 1 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: checkEncodings(SEXP) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (checkEncodings(SEXP)) [22/37=59.5%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(929,1) -> EXTERN: (933,21) Rf_length(SEXP) -> EXTERN: (933,21) Rf_length(SEXP) -> EXTERN: (935,13) Rf_length(SEXP) -> EXTERN: (936,14) Rf_getCharCE(SEXP) -> EXTERN: (938,13) Rf_error(const char *, ...) -> EXTERN: (938,19) dcgettext(const char *, const char *, int) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(933,21) remark #15523: loop was not vectorized: loop control variable i was found, but loop iteration count cannot be computed before executing the loop LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(929,1):remark #34051: REGISTER ALLOCATION : [checkEncodings] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:929 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 7[ rax rdx rsi rdi r12-r14] Routine temporaries Total : 34 Global : 20 Local : 14 Regenerable : 5 Spilled : 3 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 8 bytes* Reads : 2 [9.00e-01 ~ 1.5%] Writes : 1 [9.00e-01 ~ 1.5%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: isorted(int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (isorted(int *, int)) [23/37=62.2%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1276,1) -> INLINE: (1288,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1295,2) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE (MANUAL): (1298,9) icheck(int) (isz = 11) (sz = 18) -> INLINE (MANUAL): (1298,24) icheck(int) (isz = 11) (sz = 18) -> INLINE (MANUAL): (1300,18) icheck(int) (isz = 11) (sz = 18) -> INLINE (MANUAL): (1300,33) icheck(int) (isz = 11) (sz = 18) -> INLINE: (1304,6) mpush(int, int) (isz = 106) (sz = 114) -> INLINE: (158,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE (MANUAL): (1313,6) icheck(int) (isz = 11) (sz = 18) -> INLINE (MANUAL): (1313,21) icheck(int) (isz = 11) (sz = 18) -> INLINE: (1320,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1323,5) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1284,2) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1284,2) remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.636 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 1.370 remark #15478: estimated potential speedup: 5.850 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1284,2) remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.250 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1284,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1288,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1288,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1295,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1295,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1300,2) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1301,6) ] LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1304,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1304,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(159,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1304,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between gsngrp[flip] (160:2) and gsngrp[flip] (160:11) remark #15346: vector dependence: assumed FLOW dependence between gsngrp[flip] (160:11) and gsngrp[flip] (160:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(159,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1304,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1320,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1320,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1323,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1323,5) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1312,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Fusion of IFs performed in isorted at line 243 Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1276,1):remark #34051: REGISTER ALLOCATION : [isorted] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:1276 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 22[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm7] Routine temporaries Total : 387 Global : 164 Local : 223 Regenerable : 64 Spilled : 13 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 64 bytes* Reads : 10 [7.88e-01 ~ 0.4%] Writes : 8 [1.08e+00 ~ 0.5%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: dsorted(double *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (dsorted(double *, int)) [24/37=64.9%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1332,1) -> INDIRECT: raddr(dnan)(P64) -gpt-> dnan -> INLINE: (1343,11) dnan(void *, int) (isz = 6) (sz = 15) -> EXTERN: (650,13) __isnan(double) -> INLINE: (1346,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1353,2) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INDIRECT: raddr(dtwiddle)(P64) -gpt-> dtwiddle -> INLINE: (1356,12) dtwiddle(void *, int, int) (isz = 35) (sz = 47) -> EXTERN: (634,9) __finite(double) -> EXTERN: (636,16) __isnan(double) -> INDIRECT: raddr(dtwiddle)(P64) -gpt-> dtwiddle -> INLINE: (1357,12) dtwiddle(void *, int, int) (isz = 35) (sz = 47) -> EXTERN: (634,9) __finite(double) -> EXTERN: (636,16) __isnan(double) -> INDIRECT: raddr(dtwiddle)(P64) -gpt-> dtwiddle -> INLINE: (1361,26) dtwiddle(void *, int, int) (isz = 35) (sz = 47) -> EXTERN: (634,9) __finite(double) -> EXTERN: (636,16) __isnan(double) -> INLINE: (1366,6) mpush(int, int) (isz = 106) (sz = 114) -> INLINE: (158,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INDIRECT: raddr(dtwiddle)(P64) -gpt-> dtwiddle -> INLINE: (1381,9) dtwiddle(void *, int, int) (isz = 35) (sz = 47) -> EXTERN: (634,9) __finite(double) -> EXTERN: (636,16) __isnan(double) -> INLINE: (1389,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1394,5) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1342,2) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1342,2) remark #25084: Preprocess Loopnests: Moving Out Store [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(649,5) ] remark #15389: vectorization support: reference x[k] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(649,23) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.357 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 114 remark #15477: vector cost: 3.500 remark #15478: estimated potential speedup: 22.350 remark #15482: vectorized math library calls: 1 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1342,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1346,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1346,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1353,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1353,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1361,2) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1363,6) ] LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1366,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1366,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(159,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1366,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between gsngrp[flip] (160:2) and gsngrp[flip] (160:11) remark #15346: vector dependence: assumed FLOW dependence between gsngrp[flip] (160:11) and gsngrp[flip] (160:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(159,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1366,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1389,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1389,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1394,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1394,5) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1377,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1332,1):remark #34051: REGISTER ALLOCATION : [dsorted] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:1332 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 28[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm13] Routine temporaries Total : 393 Global : 169 Local : 224 Regenerable : 73 Spilled : 22 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 136 bytes* Reads : 29 [9.04e+00 ~ 3.9%] Writes : 20 [6.59e+00 ~ 2.8%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: csorted(SEXP *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (csorted(SEXP *, int)) [25/37=67.6%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1403,1) -> INLINE: (1416,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1423,2) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1426,9) StrCmp2(SEXP, SEXP) (isz = 14) (sz = 26) -> EXTERN: (913,18) strcmp(const char *, const char *) -> INLINE: (1428,18) StrCmp2(SEXP, SEXP) (isz = 14) (sz = 26) -> EXTERN: (913,18) strcmp(const char *, const char *) -> INLINE: (1431,6) mpush(int, int) (isz = 106) (sz = 114) -> INLINE: (158,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1442,8) StrCmp2(SEXP, SEXP) (isz = 14) (sz = 26) -> EXTERN: (913,18) strcmp(const char *, const char *) -> INLINE: (1450,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1454,5) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1412,2) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1412,2) remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.947 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 12 remark #15477: vector cost: 2.370 remark #15478: estimated potential speedup: 4.560 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1412,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1416,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1416,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1423,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1423,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1428,2) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1429,6) ] LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1431,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1431,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(159,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1431,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between gsngrp[flip] (160:2) and gsngrp[flip] (160:11) remark #15346: vector dependence: assumed FLOW dependence between gsngrp[flip] (160:11) and gsngrp[flip] (160:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(159,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1431,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1450,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1450,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1454,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1454,5) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1441,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1403,1):remark #34051: REGISTER ALLOCATION : [csorted] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:1403 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 24[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm9] Routine temporaries Total : 361 Global : 157 Local : 204 Regenerable : 60 Spilled : 12 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 56 bytes* Reads : 8 [8.31e-01 ~ 0.4%] Writes : 7 [2.06e+00 ~ 1.1%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: isort(int *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (isort(int *, int *, int)) [26/37=70.3%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1460,1) -> INLINE: (1471,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1471,15) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (1473,9) Rf_error(const char *, ...) -> INLINE: (1473,9) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE (MANUAL): (1486,24) icheck(int) (isz = 11) (sz = 18) -> (1487,9) iinsert(int *, int *, int) (isz = 284) (sz = 293) [[ Inlining would exceed -inline-max-size value (293>230) <2>]] -> INLINE: (1494,9) setRange(int *, int) (isz = 65) (sz = 74) -> EXTERN: (1496,13) Rf_error(const char *, ...) -> INLINE: (1496,13) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> (1504,13) icount(int *, int *, int) (isz = 550) (sz = 559) [[ Inlining would exceed -inline-max-size value (559>230) <2>]] -> (1506,13) iradix(int *, int *, int) (isz = 847) (sz = 857) [[ Inlining would exceed -inline-max-size value (857>230) <2>]] Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1468,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between o[i] (1470:7) and x[i] (1469:3) remark #15346: vector dependence: assumed ANTI dependence between x[i] (1469:3) and o[i] (1470:7) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1468,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1471,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1471,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1471,15) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1471,15) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1473,9) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1473,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1485,13) remark #25422: Invariant Condition at line 243 hoisted out of this loop remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between R_NaInt (243:24) and x[i] (1486:17) remark #15346: vector dependence: assumed FLOW dependence between x[i] (1486:17) and R_NaInt (243:24) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1485,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1485,13) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between R_NaInt (243:24) and x[i] (1486:17) remark #15346: vector dependence: assumed FLOW dependence between x[i] (1486:17) and R_NaInt (243:24) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1485,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(209,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1494,9) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(209,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1494,9) remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.524 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 2.620 remark #15478: estimated potential speedup: 3.670 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(209,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1494,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(211,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1494,9) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between xmax (215:2) and xmax (216:6) remark #15346: vector dependence: assumed FLOW dependence between xmax (216:6) and xmax (215:2) remark #15346: vector dependence: assumed ANTI dependence between xmax (215:2) and xmax (216:6) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(211,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1494,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1496,13) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1496,13) LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1460,1):remark #34051: REGISTER ALLOCATION : [isort] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:1460 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 18[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm3] Routine temporaries Total : 262 Global : 114 Local : 148 Regenerable : 30 Spilled : 5 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: iinsert(int *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (iinsert(int *, int *, int)) [27/37=73.0%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(331,1) -> INLINE: (351,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (354,5) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(332,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between x[i] (333:13) and o[j+1] (343:6) remark #15346: vector dependence: assumed FLOW dependence between o[j+1] (343:6) and x[i] (333:13) LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(337,6) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(340,3) ] LOOP END LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(351,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(351,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(354,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(354,5) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(347,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(331,1):remark #34051: REGISTER ALLOCATION : [iinsert] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:331 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 15[ rax rdx rcx rbx rbp rsi rdi r8-r15] Routine temporaries Total : 128 Global : 62 Local : 66 Regenerable : 22 Spilled : 9 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 24 bytes* Reads : 3 [3.76e-01 ~ 0.2%] Writes : 3 [1.15e+00 ~ 0.5%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: icount(int *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (icount(int *, int *, int)) [28/37=75.7%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(254,1) -> EXTERN: (262,2) Rf_error(const char *, ...) -> INLINE: (262,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (276,9) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (287,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (293,9) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (320,2) memset(void *, int, size_t) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(262,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(262,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(264,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between counts[*(x+i*4)-xmin] (271:6) and counts[*(x+i*4)-xmin] (271:6) remark #15346: vector dependence: assumed ANTI dependence between counts[*(x+i*4)-xmin] (271:6) and counts[*(x+i*4)-xmin] (271:6) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(264,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(276,9) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(276,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(287,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(287,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(293,9) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(293,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(296,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between R_NaInt (299:22) and o[:] (299:2) remark #15346: vector dependence: assumed FLOW dependence between o[:] (299:2) and R_NaInt (299:22) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(305,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between o[i] (306:6) and x[*(o+i*4)-1] (306:6) remark #15346: vector dependence: assumed ANTI dependence between x[*(o+i*4)-1] (306:6) and o[i] (306:6) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(305,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(315,2) remark #15389: vectorization support: reference x[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(317,10) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15335: loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15329: vectorization support: irregularly indexed store was emulated for the variable , masked, part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(317,3) ] remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 0.024 remark #15450: unmasked unaligned unit stride loads: 1 remark #15459: masked indexed (or scatter) stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 14 remark #15477: vector cost: 53.000 remark #15478: estimated potential speedup: 0.260 remark #15488: --- end vector cost summary --- remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(315,2) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(280,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(320,2):remark #34026: call to memset implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(254,1):remark #34051: REGISTER ALLOCATION : [icount] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:254 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 15[ rax rdx rcx rbx rbp rsi rdi r8-r15] Routine temporaries Total : 282 Global : 135 Local : 147 Regenerable : 38 Spilled : 19 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 104 bytes* Reads : 15 [3.95e+00 ~ 1.1%] Writes : 15 [4.68e+00 ~ 1.3%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: iradix(int *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (iradix(int *, int *, int)) [29/37=78.4%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(432,1) -> INLINE (MANUAL): (440,26) icheck(int) (isz = 11) (sz = 18) -> INLINE: (468,2) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (473,6) memset(void *, int, size_t) -> INLINE (MANUAL): (493,27) icheck(int) (isz = 11) (sz = 18) -> EXTERN: (504,30) realloc(void *, size_t) -> EXTERN: (506,13) Rf_error(const char *, ...) -> INLINE: (506,13) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (512,5) alloc_otmp(int) (isz = 18) (sz = 24) -> EXTERN: (402,20) realloc(void *, size_t) -> INLINE: (404,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (404,2) Rf_error(const char *, ...) -> INLINE: (514,5) alloc_xtmp(int) (isz = 18) (sz = 24) -> EXTERN: (418,23) realloc(void *, size_t) -> INLINE: (420,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (420,2) Rf_error(const char *, ...) -> EXTERN: (519,2) Rf_error(const char *, ...) -> INLINE: (519,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (528,13) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE (MANUAL): (533,42) icheck(int) (isz = 11) (sz = 18) -> (535,13) iradix_r(int *, int *, int, int) (isz = 360) (sz = 372) [[ Inlining would exceed -inline-max-size value (372>230) <2>]] Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(436,5) remark #25422: Invariant Condition at line 243 hoisted out of this loop remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between radixcounts[0][thisx&255] (442:2) and radixcounts[0][thisx&255] (442:2) remark #15346: vector dependence: assumed ANTI dependence between radixcounts[0][thisx&255] (442:2) and radixcounts[0][thisx&255] (442:2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(436,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between radixcounts[0][thisx&255] (442:2) and radixcounts[0][thisx&255] (442:2) remark #15346: vector dependence: assumed ANTI dependence between radixcounts[0][thisx&255] (442:2) and radixcounts[0][thisx&255] (442:2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(447,5) remark #15389: vectorization support: reference skip[radix] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(451,2) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15335: loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15328: vectorization support: irregularly indexed load was emulated for the variable , 64-bit indexed, part of index is nonlinearly computed [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(451,16) ] remark #15329: vectorization support: irregularly indexed store was emulated for the variable , masked, 64-bit indexed, part of index is nonlinearly computed [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(454,6) ] remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 0.099 remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15459: masked indexed (or scatter) stores: 1 remark #15462: unmasked indexed (or gather) loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 27 remark #15477: vector cost: 55.500 remark #15478: estimated potential speedup: 0.460 remark #15487: type converts: 4 remark #15488: --- end vector cost summary --- remark #25436: completely unrolled by 4 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(458,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(458,39) ] LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(463,6) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(463,6) remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(464,3) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.600 remark #15300: LOOP WAS VECTORIZED remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.620 remark #15478: estimated potential speedup: 4.000 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(463,6) remark #25015: Estimate of max trip count of loop=24 LOOP END LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(466,6) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(466,6) remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(467,3) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 7.180 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(466,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(468,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(468,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(471,5) remark #15527: loop was not vectorized: function call to memset(void *, int, size_t) cannot be vectorized [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(473,6) ] remark #25015: Estimate of max trip count of loop=8 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(483,5) remark #15523: loop was not vectorized: loop control variable i was found, but loop iteration count cannot be computed before executing the loop LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(492,5) remark #25422: Invariant Condition at line 243 hoisted out of this loop remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between x[i] (243:24) and o[:] (494:2) remark #15346: vector dependence: assumed FLOW dependence between o[:] (494:2) and x[i] (243:24) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(492,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(492,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between x[i] (243:24) and o[:] (494:2) remark #15346: vector dependence: assumed FLOW dependence between o[:] (494:2) and x[i] (243:24) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(492,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(506,13) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(506,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(512,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(512,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(514,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(514,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(517,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(517,47) ] remark #25015: Estimate of max trip count of loop=8 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(519,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(519,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(530,13) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(530,13) remark #25422: Invariant Condition at line 243 hoisted out of this loop remark #15389: vectorization support: reference o[itmp+j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,51) ] remark #15389: vectorization support: reference o[itmp+j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,51) ] remark #15389: vectorization support: reference radix_xsub[j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,25) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15415: vectorization support: irregularly indexed load was generated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,49) ] remark #15415: vectorization support: irregularly indexed load was generated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,49) ] remark #15415: vectorization support: irregularly indexed load was generated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,49) ] remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.140 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15462: unmasked indexed (or gather) loads: 3 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 24 remark #15477: vector cost: 13.370 remark #15478: estimated potential speedup: 1.730 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(530,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(530,13) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(530,13) remark #15389: vectorization support: reference o[itmp+j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,51) ] remark #15389: vectorization support: reference radix_xsub[j] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,25) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15415: vectorization support: irregularly indexed load was generated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,49) ] remark #15415: vectorization support: irregularly indexed load was generated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(533,49) ] remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.170 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15462: unmasked indexed (or gather) loads: 2 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 22 remark #15477: vector cost: 11.000 remark #15478: estimated potential speedup: 1.910 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(530,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(528,13) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(528,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(542,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between o[i] (543:6) and x[*(o+i*4)-1] (543:6) remark #15346: vector dependence: assumed ANTI dependence between x[*(o+i*4)-1] (543:6) and o[i] (543:6) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(542,2) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(523,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(473,6):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(473,6):remark #34026: call to memset implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(464,3):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(464,3):remark #34026: call to memset implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(432,1):remark #34051: REGISTER ALLOCATION : [iradix] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:432 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 25[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm10] Routine temporaries Total : 540 Global : 207 Local : 333 Regenerable : 57 Spilled : 20 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 120 bytes* Reads : 45 [1.98e+01 ~ 2.4%] Writes : 41 [1.53e+01 ~ 1.9%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: iradix_r(int *, int *, int, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (iradix_r(int *, int *, int, int)) [30/37=81.1%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(552,1) -> (562,2) iinsert(int *, int *, int) (isz = 284) (sz = 293) [[ Inlining would exceed -inline-max-size value (293>230) <2>]] -> EXTERN: (584,5) memcpy(void *__restrict__, const void *__restrict__, size_t) -> EXTERN: (585,5) memcpy(void *__restrict__, const void *__restrict__, size_t) -> EXTERN: (594,2) Rf_error(const char *, ...) -> INLINE: (594,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (603,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> (605,6) iradix_r(int *, int *, int, int) (isz = 360) (sz = 372) [[ Callee not marked with inlining pragma <3>]] Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(569,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between thiscounts[thisx&255] (571:2) and thiscounts[thisx&255] (571:2) remark #15346: vector dependence: assumed ANTI dependence between thiscounts[thisx&255] (571:2) and thiscounts[thisx&255] (571:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(569,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(574,5) remark #15523: loop was not vectorized: loop control variable i was found, but loop iteration count cannot be computed before executing the loop LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(578,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between thiscounts[thisx] (580:8) and thiscounts[thisx] (580:8) remark #15346: vector dependence: assumed FLOW dependence between thiscounts[thisx] (580:8) and thiscounts[thisx] (580:8) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(588,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(588,47) ] remark #25015: Estimate of max trip count of loop=8 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(594,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(594,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(603,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(603,6) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(598,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(584,5):remark #34014: optimization advice for memcpy: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(584,5):remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(584,5):remark #34026: call to memcpy implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(585,5):remark #34014: optimization advice for memcpy: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(585,5):remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(585,5):remark #34026: call to memcpy implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(552,1):remark #34051: REGISTER ALLOCATION : [iradix_r] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:552 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 15[ rax rdx rcx rbx rbp rsi rdi r8-r15] Routine temporaries Total : 159 Global : 69 Local : 90 Regenerable : 18 Spilled : 13 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 56 bytes* Reads : 7 [2.64e+00 ~ 1.1%] Writes : 7 [2.52e+00 ~ 1.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: dsort(double *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (dsort(double *, int *, int)) [31/37=83.8%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1512,1) -> INDIRECT: raddr(dnan)(P64) -gpt-> dnan -> INLINE: (1522,7) dnan(void *, int) (isz = 6) (sz = 15) -> EXTERN: (650,13) __isnan(double) -> INLINE: (1524,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1524,15) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (1527,2) Rf_error(const char *, ...) -> INLINE: (1527,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INDIRECT: raddr(dtwiddle)(P64) -gpt-> dtwiddle -> INLINE: (1532,37) dtwiddle(void *, int, int) (isz = 35) (sz = 47) -> EXTERN: (634,9) __finite(double) -> EXTERN: (636,16) __isnan(double) -> (1535,2) dinsert(unsigned long long *, int *, int) (isz = 284) (sz = 293) [[ Inlining would exceed -inline-max-size value (293>230) <2>]] -> (1537,2) dradix(unsigned char *, int *, int) (isz = 927) (sz = 937) [[ Inlining would exceed -inline-max-size value (937>230) <2>]] Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1531,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between u.ull (635:2) and u.ull (637:2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1521,6) remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1521,6) remark #25084: Preprocess Loopnests: Moving Out Store [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(649,5) ] remark #15389: vectorization support: reference x[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(649,23) ] remark #15388: vectorization support: reference o[i] has aligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1523,7) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 0.400 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15455: masked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 115 remark #15477: vector cost: 5.000 remark #15478: estimated potential speedup: 17.630 remark #15482: vectorized math library calls: 1 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1521,6) remark #15389: vectorization support: reference x[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(649,23) ] remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1523,7) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 0.327 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1521,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1524,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1524,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1524,15) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1524,15) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1527,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1527,2) LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1512,1):remark #34051: REGISTER ALLOCATION : [dsort] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:1512 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 26[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm11] Routine temporaries Total : 178 Global : 84 Local : 94 Regenerable : 33 Spilled : 11 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 48 bytes* Reads : 7 [1.61e+00 ~ 1.9%] Writes : 6 [9.97e-01 ~ 1.2%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: dinsert(unsigned long long *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (dinsert(unsigned long long *, int *, int)) [32/37=86.5%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(790,1) -> INLINE: (812,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (815,5) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(793,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between x[i] (794:2) and x[j+1] (803:6) remark #15346: vector dependence: assumed FLOW dependence between x[j+1] (803:6) and x[i] (794:2) LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(798,6) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(801,3) ] LOOP END LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(812,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(812,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(815,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(815,5) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(808,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(790,1):remark #34051: REGISTER ALLOCATION : [dinsert] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:790 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 15[ rax rdx rcx rbx rbp rsi rdi r8-r15] Routine temporaries Total : 128 Global : 62 Local : 66 Regenerable : 22 Spilled : 9 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 24 bytes* Reads : 3 [3.76e-01 ~ 0.2%] Writes : 3 [1.15e+00 ~ 0.5%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: dradix(unsigned char *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (dradix(unsigned char *, int *, int)) [33/37=89.2%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(668,1) -> INDIRECT: raddr(dtwiddle)(P64) -gpt-> dtwiddle -> INLINE: (675,10) dtwiddle(void *, int, int) (isz = 35) (sz = 47) -> EXTERN: (634,9) __finite(double) -> EXTERN: (636,16) __isnan(double) -> INDIRECT: raddr(dnan)(P64) -gpt-> dnan -> INLINE: (694,21) dnan(void *, int) (isz = 6) (sz = 15) -> EXTERN: (650,13) __isnan(double) -> INLINE: (702,2) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (709,6) memset(void *, int, size_t) -> INDIRECT: raddr(dtwiddle)(P64) -gpt-> dtwiddle -> INLINE: (723,10) dtwiddle(void *, int, int) (isz = 35) (sz = 47) -> EXTERN: (634,9) __finite(double) -> EXTERN: (636,16) __isnan(double) -> EXTERN: (735,33) realloc(void *, size_t) -> EXTERN: (737,13) Rf_error(const char *, ...) -> INLINE: (737,13) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (742,5) alloc_otmp(int) (isz = 18) (sz = 24) -> EXTERN: (402,20) realloc(void *, size_t) -> INLINE: (404,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (404,2) Rf_error(const char *, ...) -> INLINE: (743,5) alloc_xtmp(int) (isz = 18) (sz = 24) -> EXTERN: (418,23) realloc(void *, size_t) -> INLINE: (420,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (420,2) Rf_error(const char *, ...) -> EXTERN: (749,2) Rf_error(const char *, ...) -> INLINE: (749,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (758,13) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (761,17) Rf_error(const char *, ...) -> INDIRECT: raddr(dtwiddle)(P64) -gpt-> dtwiddle -> INLINE: (769,4) dtwiddle(void *, int, int) (isz = 35) (sz = 47) -> EXTERN: (634,9) __finite(double) -> EXTERN: (636,16) __isnan(double) -> (771,6) dradix_r(unsigned char *, int *, int, int) (isz = 354) (sz = 366) [[ Inlining would exceed -inline-max-size value (366>230) <2>]] -> INDIRECT: raddr(dnan)(P64) -gpt-> dnan -> INLINE: (778,13) dnan(void *, int) (isz = 6) (sz = 15) -> EXTERN: (650,13) __isnan(double) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(674,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between u.d (633:5) and thisx (680:6) remark #15346: vector dependence: assumed ANTI dependence between thisx (680:6) and u.d (633:5) LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(676,2) remark #15388: vectorization support: reference &thisx[radix] has aligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(680,44) ] remark #15388: vectorization support: reference &thisx[radix] has aligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(680,44) ] remark #15335: loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15329: vectorization support: irregularly indexed store was emulated for the variable , 64-bit indexed, part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(680,6) ] remark #15328: vectorization support: irregularly indexed load was emulated for the variable , 64-bit indexed, part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(680,6) ] remark #15305: vectorization support: vector length 2 remark #15427: loop was completely unrolled remark #15399: vectorization support: unroll factor set to 4 remark #15309: vectorization support: normalized vectorization overhead 0.022 remark #15450: unmasked unaligned unit stride loads: 2 remark #15462: unmasked indexed (or gather) loads: 1 remark #15463: unmasked indexed (or scatter) stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 12 remark #15477: vector cost: 44.500 remark #15478: estimated potential speedup: 0.260 remark #15487: type converts: 4 remark #15488: --- end vector cost summary --- remark #25436: completely unrolled by 8 LOOP END LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(682,5) remark #15388: vectorization support: reference &thisx[radix] has aligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(684,30) ] remark #15389: vectorization support: reference skip[radix] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(685,2) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15335: loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15328: vectorization support: irregularly indexed load was emulated for the variable , 64-bit indexed, part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(685,16) ] remark #15329: vectorization support: irregularly indexed store was emulated for the variable , masked, 64-bit indexed, part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(688,6) ] remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 0.112 remark #15450: unmasked unaligned unit stride loads: 2 remark #15451: unmasked unaligned unit stride stores: 1 remark #15459: masked indexed (or scatter) stores: 1 remark #15462: unmasked indexed (or gather) loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 28 remark #15477: vector cost: 53.500 remark #15478: estimated potential speedup: 0.500 remark #15487: type converts: 4 remark #15488: --- end vector cost summary --- remark #25436: completely unrolled by 8 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(691,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(691,39) ] LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(697,6) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(697,6) remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(698,3) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.600 remark #15300: LOOP WAS VECTORIZED remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.620 remark #15478: estimated potential speedup: 4.000 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(697,6) remark #25015: Estimate of max trip count of loop=24 LOOP END LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(700,6) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(700,6) remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(701,3) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 7.180 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(700,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(702,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(702,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(705,5) remark #15527: loop was not vectorized: function call to memset(void *, int, size_t) cannot be vectorized [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(709,6) ] remark #25015: Estimate of max trip count of loop=8 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(714,5) remark #15523: loop was not vectorized: loop control variable i was found, but loop iteration count cannot be computed before executing the loop LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(722,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between u.ull (635:2) and u.ull (637:2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(737,13) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(737,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(742,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(742,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(743,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(743,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(746,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(747,2) ] remark #25015: Estimate of max trip count of loop=8 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(749,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(749,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(767,3) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between u.ull (635:2) and u.ull (637:2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(758,13) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(758,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(777,2) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(777,2) remark #25084: Preprocess Loopnests: Moving Out Store [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(649,5) ] remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(778,23) ] remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(778,6) ] remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(778,6) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15328: vectorization support: irregularly indexed load was emulated for the variable , part of index is read from memory [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(649,23) ] remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.178 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15462: unmasked indexed (or gather) loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 119 remark #15477: vector cost: 9.120 remark #15478: estimated potential speedup: 8.770 remark #15482: vectorized math library calls: 1 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(777,2) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(753,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(709,6):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(709,6):remark #34026: call to memset implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(698,3):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(698,3):remark #34026: call to memset implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(668,1):remark #34051: REGISTER ALLOCATION : [dradix] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:668 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 30[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm15] Routine temporaries Total : 470 Global : 222 Local : 248 Regenerable : 65 Spilled : 31 Routine stack Variables : 8 bytes* Reads : 1 [4.57e+00 ~ 0.4%] Writes : 3 [9.67e+00 ~ 0.9%] Spills : 208 bytes* Reads : 52 [6.19e+01 ~ 6.0%] Writes : 31 [3.20e+01 ~ 3.1%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: dradix_r(unsigned char *, int *, int, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (dradix_r(unsigned char *, int *, int, int)) [34/37=91.9%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(823,1) -> (832,2) dinsert(unsigned long long *, int *, int) (isz = 284) (sz = 293) [[ Inlining would exceed -inline-max-size value (293>230) <2>]] -> DELETED: (849,2) Rf_error(const char *, ...) -> EXTERN: (864,5) memcpy(void *__restrict__, const void *__restrict__, size_t) -> EXTERN: (865,5) memcpy(void *__restrict__, const void *__restrict__, size_t) -> EXTERN: (875,2) Rf_error(const char *, ...) -> INLINE: (875,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (884,6) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> (886,6) dradix_r(unsigned char *, int *, int, int) (isz = 354) (sz = 366) [[ Callee not marked with inlining pragma <3>]] Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(838,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between thiscounts[*p] (839:2) and thiscounts[*p] (839:2) remark #15346: vector dependence: assumed ANTI dependence between thiscounts[*p] (839:2) and thiscounts[*p] (839:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(838,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(843,5) remark #15523: loop was not vectorized: loop control variable i was found, but loop iteration count cannot be computed before executing the loop LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(857,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between thiscounts[*(p+radix)] (858:16) and thiscounts[*(p+radix)] (858:16) remark #15346: vector dependence: assumed FLOW dependence between thiscounts[*(p+radix)] (858:16) and thiscounts[*(p+radix)] (858:16) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(868,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(869,2) ] remark #25015: Estimate of max trip count of loop=8 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(875,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(875,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(884,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(884,6) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(879,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(864,5):remark #34014: optimization advice for memcpy: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(864,5):remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(864,5):remark #34026: call to memcpy implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(865,5):remark #34014: optimization advice for memcpy: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(865,5):remark #34014: optimization advice for memcpy: increase the source's alignment to 16 (and use __assume_aligned) to speed up library implementation /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(865,5):remark #34026: call to memcpy implemented as a call to optimized library version /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(823,1):remark #34051: REGISTER ALLOCATION : [dradix_r] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:823 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 15[ rax rdx rcx rbx rbp rsi rdi r8-r15] Routine temporaries Total : 147 Global : 70 Local : 77 Regenerable : 18 Spilled : 13 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 56 bytes* Reads : 7 [2.62e+00 ~ 1.1%] Writes : 7 [2.51e+00 ~ 1.1%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: cgroup(SEXP *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (cgroup(SEXP *, int *, int)) [35/37=94.6%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1057,1) -> EXTERN: (1060,2) Rf_error(const char *, ...) -> INLINE: (1060,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (1065,6) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1066,6) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1071,6) R_BadLongVector(SEXP, const char *, int) -> INLINE: (1076,6) savetl(SEXP) (isz = 133) (sz = 138) -> EXTERN: (103,17) realloc(void *, size_t) -> INLINE: (105,6) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (106,6) Rf_error(const char *, ...) -> EXTERN: (109,17) realloc(void *, size_t) -> INLINE: (111,6) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (112,6) Rf_error(const char *, ...) -> EXTERN: (117,23) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1085,13) realloc(void *, size_t) -> EXTERN: (1087,3) Rf_error(const char *, ...) -> INLINE: (1087,3) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1099,2) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> EXTERN: (1099,8) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1100,2) R_BadLongVector(SEXP, const char *, int) -> EXTERN: (1105,10) R_BadLongVector(SEXP, const char *, int) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1060,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1060,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1076,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1076,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1076,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1076,6) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1087,3) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed OUTPUT dependence between (sl__x__+?)->lv_truelength (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed OUTPUT dependence between sl__x__->truelength (89:2) and (sl__x__+?)->lv_truelength (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1087,3) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1099,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1099,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1103,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1105,10) ] remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1103,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1111,5) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed OUTPUT dependence between (sl__x__+?)->lv_truelength (1112:2) and sl__x__->truelength (1112:2) remark #15346: vector dependence: assumed OUTPUT dependence between sl__x__->truelength (1112:2) and (sl__x__+?)->lv_truelength (1112:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1111,5) LOOP END Non-optimizable loops: LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1063,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1098,5) remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1057,1):remark #34051: REGISTER ALLOCATION : [cgroup] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:1057 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 15[ rax rdx rcx rbx rbp rsi rdi r8-r15] Routine temporaries Total : 269 Global : 120 Local : 149 Regenerable : 51 Spilled : 21 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 120 bytes* Reads : 36 [2.21e+01 ~ 5.9%] Writes : 27 [1.57e+01 ~ 4.2%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: csort(SEXP *, int *, int) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (csort(SEXP *, int *, int)) [36/37=97.3%] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1137,1) -> EXTERN: (1142,54) R_BadLongVector(SEXP, const char *, int) -> INLINE: (1153,9) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE: (1153,18) push(int) (isz = 92) (sz = 98) -> INLINE: (147,2) growstack(int) (isz = 65) (sz = 70) -> EXTERN: (135,16) realloc(void *, size_t) -> EXTERN: (137,2) Rf_error(const char *, ...) -> INLINE: (137,2) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> INLINE (MANUAL): (1162,29) icheck(int) (isz = 11) (sz = 18) -> (1163,9) iinsert(int *, int *, int) (isz = 284) (sz = 293) [[ Inlining would exceed -inline-max-size value (293>230) <2>]] -> INLINE: (1165,2) setRange(int *, int) (isz = 65) (sz = 74) -> EXTERN: (1167,6) Rf_error(const char *, ...) -> INLINE: (1167,6) savetl_end() (isz = 41) (sz = 44) -> EXTERN: (90,5) free(void *) -> EXTERN: (91,5) free(void *) -> (1173,6) icount(int *, int *, int) (isz = 550) (sz = 559) [[ Inlining would exceed -inline-max-size value (559>230) <2>]] -> (1175,6) iradix(int *, int *, int) (isz = 847) (sz = 857) [[ Inlining would exceed -inline-max-size value (857>230) <2>]] Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1141,5) remark #15520: loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1142,54) ] remark #25439: unrolled with remainder by 16 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1141,5) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1148,13) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1148,13) remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1149,17) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 7.180 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1148,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1150,9) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed FLOW dependence between o[i] (1152:17) and R_NaInt (1151:13) remark #15346: vector dependence: assumed ANTI dependence between R_NaInt (1151:13) and o[i] (1152:17) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1150,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1153,9) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1153,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1153,18) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1153,18) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1158,13) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1158,13) remark #15389: vectorization support: reference o[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1159,17) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 7.180 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1158,13) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1161,9) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1161,9) remark #25422: Invariant Condition at line 243 hoisted out of this loop remark #15389: vectorization support: reference csort_otmp[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1162,36) ] remark #15389: vectorization support: reference csort_otmp[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1162,36) ] remark #15389: vectorization support: reference csort_otmp[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1162,13) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.600 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 16 remark #15477: vector cost: 2.500 remark #15478: estimated potential speedup: 5.190 remark #15488: --- end vector cost summary --- remark #25456: Number of Array Refs Scalar Replaced In Loop: 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1161,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1161,9) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1161,9) remark #15389: vectorization support: reference csort_otmp[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1162,36) ] remark #15389: vectorization support: reference csort_otmp[i] has unaligned access [ /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1162,13) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.545 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 15 remark #15477: vector cost: 2.750 remark #15478: estimated potential speedup: 4.570 remark #15488: --- end vector cost summary --- remark #25456: Number of Array Refs Scalar Replaced In Loop: 1 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1161,9) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(209,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1165,2) remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(209,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1165,2) remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.524 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 2.620 remark #15478: estimated potential speedup: 3.670 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(209,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1165,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(211,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1165,2) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between xmax (215:2) and xmax (216:6) remark #15346: vector dependence: assumed FLOW dependence between xmax (216:6) and xmax (215:2) remark #15346: vector dependence: assumed ANTI dependence between xmax (215:2) and xmax (216:6) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(211,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1165,2) LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1167,6) remark #15344: loop was not vectorized: vector dependence prevents vectorization remark #15346: vector dependence: assumed ANTI dependence between savedtl[i] (89:2) and sl__x__->truelength (89:2) remark #15346: vector dependence: assumed FLOW dependence between sl__x__->truelength (89:2) and savedtl[i] (89:2) remark #25439: unrolled with remainder by 2 LOOP END LOOP BEGIN at /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(88,5) inlined into /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1167,6) LOOP END Report from: Code generation optimizations [cg] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(1137,1):remark #34051: REGISTER ALLOCATION : [csort] /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c:1137 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 23[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm8] Routine temporaries Total : 408 Global : 162 Local : 246 Regenerable : 78 Spilled : 5 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Spills : 0 bytes* Reads : 0 [0.00e+00 ~ 0.0%] Writes : 0 [0.00e+00 ~ 0.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: savetl_init() Report from: Interprocedural optimizations [ipo] DEAD STATIC FUNCTION: (savetl_init()) /home/nwknoblauch/Downloads/R-3.4.0/src/main/radixsort.c(66,1) =========================================================================== Report from: Interprocedural optimizations [ipo] INLINING FOOTNOTES: <1> The compiler's heuristics predict that it is not profitable to inline the call. Add "inline __attribute__((always_inline))" to the declaration of the called function or add "#pragma forceinline" before the call site. <2> The function is larger than the inliner would normally inline. Use the option -inline-max-size to increase the size of any function that would normally be inlined, add "inline __attribute__((always_inline))" to the declaration of the called function, or add "#pragma forceinline" before the call site. <3> The compiler's heuristics indicate that the function is not profitable to inline. Override this decision by adding "inline __attribute__((always_inline))" to the declaration of the called function, or add "#pragma forceinline" before the call site.