I am profiling an application with Intel VTune Amplifier 16. This application was compiled with Intel Fortran compiler from Intel Composer XE-2016 (version 16.0.0).
The profile shows a enormous usage of the __intel_ssse3_rep_memcpy and __intel_new_memset functions (26% of the execution time) and I would like to know exactly what these function do. Can anyone help me?
Such functions could be invoked automatically by Intel compiler, either as a substitution for the standard memset() and memcpy() functions, or by recognizing a for() loop which performs equivalent functionality. Compiler option Qopt-report:4 may flag where for loops are replaced by these functions.
If using ifort, temporaries generated by array section operations may give rise to fast_memcpy.
A possible reason why such "optimization" (if the copy can't be avoided) might be counter-productive is the case where the strings aren't big enough for the library functions to come up to speed, but the compiler isn't able to recognize that at compile time. Short strings might be better optimized with for loops plus alignment and length assertions (not to mention arranging the application to reduce the copying of data).