Re: sse4.2 instructions

westmere · ‎05-01-2009

If I have existing c/c++ source code, do I need to modify the code before compiling with an appropriate compiler to get the benefits of the sse4.2 instructions, or will the new compilers automagically use the sse4.2 instructions for string comparisons?

I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.

Thanks in advance.

TimP · ‎05-01-2009

Current Intel and Sun compilers have an explicit SSE4.2 compile option, but I haven't seen a case to show that code such as in
http://software.intel.com/en-us/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4/
might be generated by auto-vectorization without explicitly writing the SSE4.2 intrinsics into the code. This requires a compiler with the up to date include file, and (for linux) a binutils 2.9.xx.
There is no current mechanism to take advantage of SSE4.2 instructions without recompilation, although there appear to be several research projects on binary translation.
Existing code which uses parallel move instructions, for example, automatically takes advantage of the improved support of varying alignments in SSE4.2 (and recent AMD) CPUs.

SHIH_K_Intel · ‎05-02-2009

Quoting - westmere

If I have existing c/c++ source code, do I need to modify the code before compiling with an appropriate compiler to get the benefits of the sse4.2 instructions, or will the new compilers automagically use the sse4.2 instructions for string comparisons?

I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.

Thanks in advance.

Many string functions in the runtime library can be sped up using SSE4.2 instructions. Some of them can also be sped up using SSE2 as well. Various compilers are exploring the possibility of drop-in replacement of runtime library functions using newer instruction set. I would keep my fingers crossed that it will happen in the near future.

westmere · ‎05-04-2009

Quoting - tim18

I haven't seen a case to show that code such as in
http://software.intel.com/en-us/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4/
might be generated by auto-vectorization without explicitly writing the SSE4.2 intrinsics into the code.

Thanks for the response tim18. Any idea if auto-vectorization is planned for future compiler releases, or if gcc -ftree-vectorize -msse4.2 might already do this?

TimP · ‎05-04-2009

Quoting - westmere

Thanks for the response tim18. Any idea if auto-vectorization is planned for future compiler releases, or if gcc -ftree-vectorize -msse4.2 might already do this?

I do have an example where ifort -xsse4.2 uses the horizontal dot product, but only in a remainder loop, so it's not significant for performance. The expectation would be that horizontal dot product would be useful only in limited situations, such as where there is a fixed dot product length of 4. It may be that the code would be optimized automatically in that situation.
The same examples, with g++ or gfortran 4.5, generate identical code with sse4.1 or sse 4.2 options. While the gcc/g++/gfortran use of sse4 code shows some consistent performance gains over sse3, sse4.1 isn't used in the same ways in my code samples by gcc and Intel compilers, with the exception of the _mm_set_ps, where both compilers shift to sse4.1 code (so it's not necessary to shift source code to the corresponding sse4.1 intrinsic). g++ 4.5 has more effective auto-vectorization than previous g++.
I haven't found any use of sse4 code by the Sun compilers, but they frequently vectorize effectively for sse4.2 CPUs, using sse instructions, even in a few situations where the others don't.
The marketing people usually miss several points: the few situations where new instructions are beneficial are far outnumbered by those where the old instructions may be optimized better for the new CPUs. There isn't sufficient incentive to make applications incompatible with older CPUs, when the AVX instruction set will offer real gains in a year or two.

westmere · ‎05-04-2009

Thanks for the help tim18.

SHIH_K_Intel · ‎10-15-2009

Quoting - westmere

If I have existing c/c++ source code, do I need to modify the code before compiling with an appropriate compiler to get the benefits of the sse4.2 instructions, or will the new compilers automagically use the sse4.2 instructions for string comparisons?

I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.

Thanks in advance.

You might want to check out the the alpha code of Glibc 2.11 string and memory functions. It includes multi-arch support so that the library can be configuredand built to recognize what ISA is available and your existing code calling string and memory functions of Glibc 2.11 will execute using SIMD code on Nehalem, Penryn, Merom based processors.

You might also be interested in SSE4.2 example that speeds up string to integer conversion function. One such example is shown in the latest Optimization manual.
http://www.intel.com/products/processor/manuals/index.htm

SHIH_K_Intel · ‎11-02-2009

Quoting - Shih Kuo (Intel)

You might want to check out the the alpha code of Glibc 2.11 string and memory functions. It includes multi-arch support so that the library can be configuredand built to recognize what ISA is available and your existing code calling string and memory functions of Glibc 2.11 will execute using SIMD code on Nehalem, Penryn, Merom based processors.

You might also be interested in SSE4.2 example that speeds up string to integer conversion function. One such example is shown in the latest Optimization manual.
http://www.intel.com/products/processor/manuals/index.htm

Latest news from the Glibc front.
http://sourceware.org/ml/libc-alpha/2009-10/msg00063.html