Solved: William: I have serious

William_S_8 · ‎03-14-2017

GCC 5.x and later support built-in functions for overflow checking [1] that are unsupported by Intel C/C++ in versions including the recently released 2017.0.2.174 / 20170213. This creates issues when the system gcc compilers are newer than 4.9.x and users would like to use GCC 5 or later to allow for AVX 512 support in their code.

An example of this issue might include the following error building m4 from source with GCC 5.4.0:

m4-1.4.18/lib/xalloc.h:107: undefined reference to `__builtin_mul_overflow'

Is there a better general workaround than setting -no-gcc or forcing __GNUC__, __GNUC_MINOR__, and __GNUC_PATCHLEVEL__ to an older version of GCC?

[1] https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html

nemequ · ‎03-23-2017

If you'd like something to tide you over until this is fixed, I've been working on a project which may help called "portable snippets". Specifically, the safe-math module provides an API for overflow-safe math similar to __builtin_*_overflow which works on most compilers. If you define PSNIP_SAFE_EMULATE_NATIVE or PSNIP_BUILTIN_EMULATE_NATIVE prior to including safe-math.h it will also define __builtin_*_overflow on compilers which don't support it, including ICC, or you can just use the psnip_safe_* functions directly.

Sorry if this seems spammy; I don't usually like to link to my own projects like this, but in this case I thought it might help :/

View solution in original post

SergeyKostrov · ‎03-15-2017

>>...This creates issues when the system gcc compilers are newer than 4.9.x and users would like to use GCC 5 or later to allow >>for AVX 512 support in their code. I simply would like you to know that GCC versions 4.9.x support AVX-512 ISA: ... -mavx512cd Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512CD built- in functions and code generation -mavx512er Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512ER built- in functions and code generation -mavx512f Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F built-in functions and code generation -mavx512pf Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX512PF built- in functions and code generation ... If that issue is related to GCC versions 5.x it is better to put on hold future upgrades until there is a response from Intel C++ compiler team. Also, take into account that it always takes some time to add a new feature, or to fix some problem(s).

Judith_W_Intel · ‎03-17-2017

Thank you for reporting the missing overflow builtins.

I have submitted a bug report on this problem in our internal bugs database (DPD200419083).

nemequ · ‎03-23-2017

If you'd like something to tide you over until this is fixed, I've been working on a project which may help called "portable snippets". Specifically, the safe-math module provides an API for overflow-safe math similar to __builtin_*_overflow which works on most compilers. If you define PSNIP_SAFE_EMULATE_NATIVE or PSNIP_BUILTIN_EMULATE_NATIVE prior to including safe-math.h it will also define __builtin_*_overflow on compilers which don't support it, including ICC, or you can just use the psnip_safe_* functions directly.

Sorry if this seems spammy; I don't usually like to link to my own projects like this, but in this case I thought it might help :/

William_S_8 · ‎04-28-2017

Thanks Evan, it's useful.

Judith, there is also support ticket 02772245.

Jeffrey_H_Intel · ‎04-28-2017

William: I have serious doubts that AVX-512 code generation will improve the performance of m4 relative to AVX2...

Evan: 100% of the projects you have created that I know about are high-quality and useful to a nontrivial number of people. Please do not hesitate to shamelessly self-promote.

Judith: Hopefully the Intel Support ticket (i.e. 02772245) I created for William will be linked to the DPD ticket you created. I made a note to that effect already. If there are any issues, please send me an email.

Judith_W_Intel · ‎05-04-2017

yes they are linked - I now see Q02772245 in the Customer Support ID field for the bug.

thanks

Judy

SergeyKostrov · ‎05-08-2017

>>...I have serious doubts that AVX-512 code generation will improve the performance of m4 relative to AVX2 As far as I know M4 is a macro processor, that is this is Not an HPC application, and my question is are there any performance problems with M4 at the moment to justify a change from AVX2 to AVX-512?

William_S_8 · ‎05-09-2017

Jeff:

One of two situations are possible given the initial post:

I am obsessive about build system performance and want to settle complaints about build speed on KNL systems and achieve a reasonable percentage of peak with autoconf. It'll be the basis of the most useful Gordon Bell submission ever.
I am working with a user code that was written by someone that always uses the newest gcc, uses a number of gcc builtins, isn't open source, is barely available to me, and does benefit from improved vectorization. Looking around for a common, easy to build, open source code that shows how the built-ins are used turned up GNU m4 as a potentially useful test case for someone evaluating builtin functionality on a system with GCC 5 or later without having to produce a detailed test case or extract examples from the user code. Thus, the initial post mentions m4 as an example of code using the builtins rather than the exact codebase that led to the post.

Luckily, in the second scenario, Evan's portable-snippets allow for progress.

SergeyKostrov · ‎05-09-2017

>>...build system performance and want to settle complaints about build speed on KNL systems and achieve a reasonable >>percentage of peak with autoconf. It is unusual application of a KNL system. That is, you're using it to build ( compile ) software instead of using the KNL system for HPC processing. Take into account that a 3rd Generation 2.8 GHz Ivy Bridge system that uses 4 hardware threads is almost the same ( when compare performance ) as a 1.3 GHz Knights Landing system that uses only 8 hardware threads out of 64 hardware threads. It means, that you need to consider to use as more as possible threads on a KNL system instead of trying to rebuild some software to use AVX-512 ISA. I would suggest you to look ( in case of a Linux with htop utility ) how many threads are used on a KNL system when investigating complaints from a customer.

compatibility with GCC 5.x and later built-in functions