- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I have existing c/c++ source code, do I need to modify the code before compiling with an appropriate compiler to get the benefits of the sse4.2 instructions, or will the new compilers automagically use the sse4.2 instructions for string comparisons?
I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.
Thanks in advance.
I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.
Thanks in advance.
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Current Intel and Sun compilers have an explicit SSE4.2 compile option, but I haven't seen a case to show that code such as in
http://software.intel.com/en-us/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4/
might be generated by auto-vectorization without explicitly writing the SSE4.2 intrinsics into the code. This requires a compiler with the up to date include file, and (for linux) a binutils 2.9.xx.
There is no current mechanism to take advantage of SSE4.2 instructions without recompilation, although there appear to be several research projects on binary translation.
Existing code which uses parallel move instructions, for example, automatically takes advantage of the improved support of varying alignments in SSE4.2 (and recent AMD) CPUs.
http://software.intel.com/en-us/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4/
might be generated by auto-vectorization without explicitly writing the SSE4.2 intrinsics into the code. This requires a compiler with the up to date include file, and (for linux) a binutils 2.9.xx.
There is no current mechanism to take advantage of SSE4.2 instructions without recompilation, although there appear to be several research projects on binary translation.
Existing code which uses parallel move instructions, for example, automatically takes advantage of the improved support of varying alignments in SSE4.2 (and recent AMD) CPUs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - westmere
If I have existing c/c++ source code, do I need to modify the code before compiling with an appropriate compiler to get the benefits of the sse4.2 instructions, or will the new compilers automagically use the sse4.2 instructions for string comparisons?
I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.
Thanks in advance.
I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.
Thanks in advance.
Many string functions in the runtime library can be sped up using SSE4.2 instructions. Some of them can also be sped up using SSE2 as well. Various compilers are exploring the possibility of drop-in replacement of runtime library functions using newer instruction set. I would keep my fingers crossed that it will happen in the near future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
I haven't seen a case to show that code such as in
http://software.intel.com/en-us/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4/
might be generated by auto-vectorization without explicitly writing the SSE4.2 intrinsics into the code.
http://software.intel.com/en-us/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4/
might be generated by auto-vectorization without explicitly writing the SSE4.2 intrinsics into the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - westmere
Thanks for the response tim18. Any idea if auto-vectorization is planned for future compiler releases, or if gcc -ftree-vectorize -msse4.2 might already do this?
The same examples, with g++ or gfortran 4.5, generate identical code with sse4.1 or sse 4.2 options. While the gcc/g++/gfortran use of sse4 code shows some consistent performance gains over sse3, sse4.1 isn't used in the same ways in my code samples by gcc and Intel compilers, with the exception of the _mm_set_ps, where both compilers shift to sse4.1 code (so it's not necessary to shift source code to the corresponding sse4.1 intrinsic). g++ 4.5 has more effective auto-vectorization than previous g++.
I haven't found any use of sse4 code by the Sun compilers, but they frequently vectorize effectively for sse4.2 CPUs, using sse instructions, even in a few situations where the others don't.
The marketing people usually miss several points: the few situations where new instructions are beneficial are far outnumbered by those where the old instructions may be optimized better for the new CPUs. There isn't sufficient incentive to make applications incompatible with older CPUs, when the AVX instruction set will offer real gains in a year or two.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the help tim18.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - westmere
If I have existing c/c++ source code, do I need to modify the code before compiling with an appropriate compiler to get the benefits of the sse4.2 instructions, or will the new compilers automagically use the sse4.2 instructions for string comparisons?
I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.
Thanks in advance.
I've read all the white papers and web pages I could find that I thought would be relevant, but have yet to find a definitive answer. The best I've found is that "All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4." But there is no mention that the existing software will be able to take advantage of the sse4.2 instructions without modification.
Thanks in advance.
You might want to check out the the alpha code of Glibc 2.11 string and memory functions. It includes multi-arch support so that the library can be configuredand built to recognize what ISA is available and your existing code calling string and memory functions of Glibc 2.11 will execute using SIMD code on Nehalem, Penryn, Merom based processors.
You might also be interested in SSE4.2 example that speeds up string to integer conversion function. One such example is shown in the latest Optimization manual.
http://www.intel.com/products/processor/manuals/index.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Shih Kuo (Intel)
You might want to check out the the alpha code of Glibc 2.11 string and memory functions. It includes multi-arch support so that the library can be configuredand built to recognize what ISA is available and your existing code calling string and memory functions of Glibc 2.11 will execute using SIMD code on Nehalem, Penryn, Merom based processors.
You might also be interested in SSE4.2 example that speeds up string to integer conversion function. One such example is shown in the latest Optimization manual.
http://www.intel.com/products/processor/manuals/index.htm
Latest news from the Glibc front.
http://sourceware.org/ml/libc-alpha/2009-10/msg00063.html
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page