Intel C/C++ compilers complete adoption of LLVM

James_Reinders · ‎08-10-2021

The benefits of adopting LLVM are numerous. I will offer advice for upgrading from our classic compilers to our LLVM-based compilers. We are committed to making this as seamless as possible while yielding numerous benefits for developers who use the Intel compilers.

Benefits of adopting LLVM

The LLVM open source project is a collection of modular and reusable compiler and toolchain technologies supporting multiple processor architectures and programming languages. The Clang open source project provides a C/C++ frontend supporting the latest language standards for the LLVM project. LLVM, including Clang, are maintained by a large and very active development community.

There are many benefits in adopting LLVM, but let’s start with faster build times. Clang is fast. We all can appreciate that! We measured a 14% reduction in build times when using the Intel C/C++ compiler included in the Intel oneAPI 2021.3 toolkits. In addition to helping reduce build times, adoption of Clang has allowed us to contribute to, and benefit from, community efforts to support the latest C++ language standards.

Intel has a long history of contributing and supporting open source projects that includes a decade of contributions to LLVM. Our active collaborations today include optimization report additions, expanded floating-point model support, and enhanced vectorization. Intel contributes to LLVM projects directly, and we also have a staging area (Intel project for LLVM technology) for SYCL support.

The performance of the Intel C/C++ compilers can be expected to give higher performance than the base clang+LLVM compilers for Intel architecture. The default for the Intel C/C++ compilers going forward are versions (icx) that have adopted the LLVM open source infrastructure. We continue our strong history of contributing to the clang and LLVM projects, including optimizations for LLVM and clang. Not all our optimization techniques get upstreamed—sometimes because they are too new, sometimes because they are very specific for Intel architecture. This is to be expected and is consistent with other compilers that have adopted LLVM.

With the latest Intel C/C++ compilers, released with the Intel oneAPI toolkits versions 2021.3, we made a series of performance measurements. Consistent with our objective to be the leading C/C++ compiler for Intel architecture, our measurements show Intel C/C++ compilers besting other options. We also beat ourselves: the new LLVM-based Intel C/C+ compiler matches or exceed the Intel C/C++ classic compiler. It’s time to upgrade the compiler you use! I share one examples here, and more of our measurements are included at the end of this blog.

Intel C/C++ compilers have a history of offering leadership performance. While the classic Intel C/C++ compiler shows a 18% advantage over gcc here, the LLVM-based Intel C/C++ compiler shows a 41% advantage.

To support Intel's evolving platforms, we are focusing new feature and hardware support in our LLVM-based compilers where we have added highly optimized support for GPUs and FPGAs alongside our continuing commitment to provide industry leading CPU optimizations. Our LLVM-based compilers are where we will have support for SYCL, C++20, OpenMP 5.1, and OpenMP GPU target device support.

We encourage users to take advantage of the faster build times, higher levels of optimization, and new capabilities by moving now to our LLVM-based C/C++ compilers. Intel is committed long-term to LLVM, to help with ongoing innovation, and our relentless pursuit of industry leading optimizations.

What happened to the Parallel Studio XE compilers?

In 2007, we renamed our tools “Parallel Studio” to emphasize our support for parallelism. At that time, the world was changing as parallel programming was destined to be ubiquitous in the form of multicore processors. It started with dual-core processors supplanting single core processors. Today, core counts are in the dozens and still on an upward trend.

Just like parallel programming for homogeneous systems has become ubiquitous, we see parallel programming for heterogeneous systems on a similar path to being ubiquitous. Unlike multicore parallelism, heterogeneous programming will span compute capabilities from multiple vendors. This threatens to fragment programming unless we all come together to support open multivendor approaches in compilers, libraries, frameworks, and all tooling for software developers.

We named this next generation of our popular tools to emphasize the oneAPI open approach to heterogeneous parallelism. They remain the same product quality tools the industry has relied upon for decades, extended to support heterogeneous programming by embracing the oneAPI specification and SYCL standard. Download and start using the tools right away–at no cost! Community support is available at the Intel Community Forums. Intel continues to offer Priority Support to submit questions, problems, and other technical support issues.

C/C++ is ready now

We recommend that all new projects start with the LLVM-based Intel C/C++ compilers, and all existing projects should make a plan to migrate to the new compiler this year. At some point in the future, the classic C/C++ compilers will enter “Legacy Product Support” mode signaling the end of regular updates to the classic compiler base, and they will no longer appear in oneAPI toolkits.

32bit mode.png

The new LLVM-based Intel C/C++ has reached parity with the classic version, and the LLVM-based C/C++ offers the best optimization technology we have. We suggest all users should try the new C/C++ compiler now, enjoy the benefits, and provide feedback.

There is an excellent guide for converting from the classic C/C++ compiler to the LLVM-based compilers. The first thing you’ll notice is that the compiler has a different name (icx). This allows you to have both the classic and the new compilers installed and choose between them. Many users have already made the switch to rely solely on the LLVM-based Intel C/C++ compilers for their products going forward. The latest release notes offer more details on known issues and limitations (release notes for the classic C/C++ compilers are also available). Check out our webinars ("Talk to Experts") for opportunities to hear from experts live or via on-demand viewing of previously recorded sessions.

LLVM-based Intel Fortran compiler is a work in progress

Intel Fortran has long been known for extensive standards support and superior performance. That tradition will continue with an LLVM-based Intel Fortran compiler once we complete our beta program. We appreciate feedback.

The LLVM-based Fortran compiler beta offers extensive support of Fortran, while some functionality remains a work-in-progress. You can review the status of specific features to see if it is ready for you: a release-by-release status for individual features can be found in our Fortran and OpenMP feature status table for the LLVM-based Fortran. Fortran compiler release notes can be found together for both the classic and beta (LLVM-based) compilers.

I’ll post a blog later this year, updating our adoption of LLVM for Fortran.

Excellent New Chapter for Intel Compilers

The Intel C/C++ and Fortran compilers products have a rich history that started with UNIX System V compilers in the early 1990s, added compiler technology from Multiflow in the mid-1990s, and we grew in the 2000s with the fabled DEC/Compaq Fortran team plus Kuck and Associates Inc. (KAI) OpenMP and parallelism expertise. As the Intel compilers enter their fourth decade, they continue their journey with LLVM compiler technology. Users of Intel compilers will continue to see strong standards support, reliable code optimization, and strong dedication to supporting your needs. All with the added mission of leading the way in supporting heterogeneous programming.

We continue to be committed to making Intel C/C++ and Fortran compilers important and useful tools in your quest to build world changing applications.

Learn More – "Talk to the Compiler Experts" Webinars

We offer live interactive sessions hosted by experts in Intel Compiler technologies. These "Talk to Expert" sessions are great to attend live because you can ask questions and get them answered on the spot. After we have our live session, the recording is available on demand (many prior sessions are available now!), and our community forums are a great place to ask questions whenever you have them.

Check "Talk to Experts" Sign-up to register for the detailed information on joining sessions and gettings notifications if anything changes.

Two key sessions I recommend are:

August 19, 2021: Intel oneAPI Release 2021.3: Getting Started with Latest LLVM-based Compilers
This session offers a solid overview that expands on what I covered in this blog.
September 9, 2021: Compiler Driver Options, Pragmas & Intrinsics
This session discusses information found in the excellent guide for converting from the classic C/C++ compiler to the LLVM-based compilers. We choose to be true to the new LLVM compiler and that menas there are a handful of differences in the driver, and differences in pragmas and intrinsics. This session dives into them. It's not hard to master; it does help to hear our experts talk about them and be able to ask questions.

Get the Latest Intel Compilers, Now, for Free – Download Now

Users of the Intel compilers can now enjoy the best of both worlds, combining Intel’s decades of expertise in optimization for Intel architecture and OpenMP, with LLVM.

Download today from the oneAPI toolkit website.

Please post comments with your thoughts, feedback, and suggestions on community.intel.com James-Reinders-Blog.

More Benchmarks and Configuration Details

Together, these benchmarks show that we've reached the sought after tipping point where the LLVM-based compiler fully ready to take on the role as the preferred compiler for all our users.

Faster Compile Times

The SPEC CPU 2017 benchmark package contains industry-standardized, CPU intensive suites for measuring and comparing compute intensive performance, stressing a system's processor, memory subsystem and compiler. More information on the SPEC benchmarks can be found at: https://www.spec.org.

Configuration: Testing by Intel as of Jun 10,2021. Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz, 16G x2 DDR4 2666. Red Hat Enterprise Linux release 8.0 (Ootpa), 4.18.0-80.el8.x86_64. Software: Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604_000000. Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604. Compiler switches: Intel(R) 64 Compiler Classic: -O2 -xCORE-AVX512, Intel(R) oneAPI DPC++/C++ Compiler: -O2 -xCORE-AVX512

Optimized Performance

SPECrate 2017 (Estimated)

The SPEC CPU 2017 benchmark package contains industry-standardized, CPU intensive suites for measuring and comparing compute intensive performance, stressing a system's processor, memory subsystem and compiler. More information on the SPEC benchmarks can be found at: https://www.spec.org.

Configuration: Testing by Intel as of Jun 10,2021. Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz, 2 socket, Hyper Thread on, Turbo on, 32G x16 DDR4 3200 (1DPC). Red Hat Enterprise Linux release 8.2 (Ootpa), 4.18.0-193.el8.x86_64. Software: Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604. Intel(R) C++ Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604_000000, GCC 11.1, Clang/LLVM 12.0.0. SPECint®_rate_base_2017 compiler switches: Intel(R) oneAPI DPC++/C++ Compiler: -xCORE-AVX512 -O3 -ffast-math -flto -mfpmath=sse -funroll-loops -qopt-mem-layout-trans=4. Intel(R) C++ Intel(R) 64 Compiler Classic: -xCORE-AVX512 -ipo -O3 -no-prec-div -qopt-mem-layout-trans=4 -qopt-multiple-gather-scatter-by-shuffles. GCC: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -mprefer-vector-width=128. LLVM: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto. qkmalloc used for intel compiler. jemalloc 5.0.1 used for gcc and llvm. SPECfp®_rate_base_2017 compiler switches: Intel(R) oneAPI DPC++/C++ Compiler: -xCORE-AVX512 -Ofast -ffast-math -flto -mfpmath=sse -funroll-loops -qopt-mem-layout-trans=4. Intel(R) C++ Intel(R) 64 Compiler Classic: -xCORE-AVX512 -ipo -O3 -no-prec-div -qopt-prefetch -ffinite-math-only -qopt-multiple-gather-scatter-by-shuffles -qopt-mem-layout-trans=4. GCC: -march=skylake-avx512 -mfpmath=sse -Ofast -fno-associative-math -funroll-loops -flto. LLVM: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto.

SPECspeed 2017 (Estimated)

The SPEC CPU 2017 benchmark package contains industry-standardized, CPU intensive suites for measuring and comparing compute intensive performance, stressing a system's processor, memory subsystem and compiler. More information on the SPEC benchmarks can be found at: https://www.spec.org.

Configuration: Testing by Intel as of Jun 10, 2021. Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz, 2 socket, Hyper Thread on, Turbo on, 32G x16 DDR4 3200 (1DPC). Red Hat Enterprise Linux release 8.2 (Ootpa), 4.18.0-193.el8.x86_64. Software: Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604. Intel(R) C++ Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604_000000, GCC 11.1, Clang/LLVM 12.0.0. SPECint®_speed_base_2017 compiler switches: Intel(R) oneAPI DPC++/C++ Compiler: -xCORE-AVX512 -O3 -ffast-math -flto -mfpmath=sse -funroll-loops -qopt-mem-layout-trans=4 -fiopenmp. Intel(R) C++ Intel(R) 64 Compiler Classic: -xCORE-AVX512 -ipo -O3 -no-prec-div -qopt-mem-layout-trans=4 -qopt-GCC: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto –fopenmp. LLVM: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -fopenmp=libomp. multiple-gather-scatter-by-shuffles -qopenmp. jemalloc 5.0.1 used for intel compiler, gcc and llvm. SPECfp®_speed_base_2017 compiler switches: Intel(R) oneAPI DPC++/C++ Compiler: -xCORE-AVX512 -Ofast -ffast-math -flto -mfpmath=sse -funroll-loops -qopt-mem-layout-trans=4 -fiopenmp. Intel(R) C++ Intel(R) 64 Compiler Classic: -xCORE-AVX512 -ipo -O3 -no-prec-div -qopt-prefetch -ffinite-math-only -qopt-multiple-gather-scatter-by-shuffles -qopenmp. GCC: -march=skylake-avx512 -mfpmath=sse -Ofast -fno-associative-math -funroll-loops -flto –fopenmp. LLVM: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -fopenmp=libomp. jemalloc 5.0.1 used for intel compiler, gcc and llvm.

CoreMark-Pro on Intel® Core i7-8700K Processor

CoreMark-Pro aims to test the entire processor, with comprehensive support for multicore technology, a combination of integer and floating-point workloads, and data sets for utilizing larger memory subsystems. For more information on CoreMark-Pro from the Embedded Microprocessor Benchmark Consortium (EEMBC), see https://www.eembc.org/coremark-pro/.

In these benchmark results, both Intel compiler options are close but the numbers show that we have a little more work to do for the Intel LLVM-based compiler to beat the classic compiler. I hope you'll agree that this is close enough given the other outstanding results from our LLVM-based compilers.

Testing by Intel as of Jun 10, 2021 - Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz, 16G x2 DDR4 2666. Software: Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210604_000000, GCC 11.1, Clang/LLVM 12.0.0. Red Hat Enterprise Linux release 8.0 (Ootpa), 4.18.0-80.el8.x86_64. Compiler switches: Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.1 Build 20201112_000000: icc -xCORE-AVX2 -mtune=skylake -ipo -O3 -no-prec-div -qopt-prefetch. GCC 11.1: gcc -march=native -mfpmath=sse -Ofast -funroll-loops -flto. LLVM 12.0.0: clang -Ofast -funroll-loops -flto -static -mfpmath=sse -march=native.

CoreMark-Pro on Intel® Atom C3850 Processor

CoreMark-Pro aims to test the entire processor, with comprehensive support for multicore technology, a combination of integer and floating-point workloads, and data sets for utilizing larger memory subsystems. For more information on CoreMark-Pro from the Embedded Microprocessor Benchmark Consortium (EEMBC), see https://www.eembc.org/coremark-pro/.

In these benchmark results, both Intel compiler options are close but the numbers show that we have a little more work to do for the Intel LLVM-based compiler to beat the classic compiler in one case. I hope you'll agree that this is close enough given the other outstanding results from our LLVM-based compilers.

Configuration: Testing by Intel as of Jun 10, 2021 - Intel(R) Atom(TM) CPU C3850 @ 2.10GHz, 16G x2 DDR4 2400. Software: Intel(R) C Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.1 Build 20201112_000000, GCC 11.1, Clang/LLVM 12.0.0. Red Hat Enterprise Linux release 8.0 (Ootpa), 4.18.0-80.el8.x86_64. Compiler switches: Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.1 Build 20201112_000000: icc -xATOM_SSE4.2 -mtune=goldmont -ipo -O3 -no-prec-div -qopt-prefetch. GCC 11.1: gcc -march=native -mfpmath=sse -Ofast -funroll-loops -flto. LLVM 12.0.0: clang -Ofast -funroll-loops -flto -static -mfpmath=sse -march=native.

LORE: Loop Repository for Evaluation of Compilers Benchmarks

LORE tests C language for loop nests extracted from popular benchmarks, libraries, and real applications. Loops cover a variety of properties that can be used by the compiler community to evaluate loop optimization. 65 Benchmarks & Workloads Tested. For more information, see https://www.vectorization.computer

Configuration: Testing by Intel as of Jun 9, 2021 -Intel(R) Xeon(R) Platinum 8180CPU @ 2.50GHz, 2 socket, 28 cores, HT enabled, Turbo enabled, 384GB RAM. Software: Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.2.0 Build 20210607, Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210607. Ubuntu 18.04.1 with GCC 10.2.0. Compiler switches: Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210607: ICC OPT - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -xHost -w". ICC OPT512 - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -xHost -w -qopt-zmm-usage=high”. Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.2.0 Build 20210607: ICX OPT - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -xHost -w". ICX OPT512 - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -xHost -w -mprefer-vector-width=512". ICX OPTm - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -march=skylake-avx512 -w". ICX OPT512m - OPT="-Ofast -qopt-prefetch -unroll-aggressive -restrict -march=skylake-avx512 -w -mprefer-vector-width=512.

RAJA Performance Suite (RAJAPerf)

The RAJA Performance Suite is designed to explore performance of loop-based computational kernels found in HPC applications. Learn more about RAJA Performance Suite at https://github.com/LLNL/RAJAPerf.

You might note that this demanding benchmarks shows parity with our classic compiler, not an improvement. That's still a solid and impressive result.. I did not hesitate to include it, because I am showing the benchmarks we ran to ensure we reached this point where the new LLVM-based version is now worthy of fully recommending.

Configuration: Testing by Intel as of Jun 9, 2021 -Intel(R) Xeon(R) Platinum 8180CPU @ 2.50GHz, 2 socket, 28 cores, HT enabled, Turbo enabled, 384GB RAM. Software: Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.2.0 Build 20210607, Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210607. Ubuntu 18.04.1 with GCC 10.2.0. Compiler switches: Intel(R) C++ Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210607: ICC OPT OPT="-Ofast -ansi-alias -xCORE-AVX512", ICC OPT512 OPT="-Ofast -ansi-alias -xCORE-AVX512 -qopt-zmm-usage=high", setenv KMP_AFFINITY compact,granularity=fine. Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2021.2.0 Build 20210607: ICX OPT OPT="-Ofast -ansi-alias -xCORE-AVX512", ICX OPT512 OPT="-Ofast -ansi-alias -xCORE-AVX512 -qopt-zmm-usage=high", setenv KMP_AFFINITY compact,granularity=fine

Performance varies by use, configuration, and other factors. Learn more at www.intel.com/PerformanceIndex. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software, or service activation.

James_Reinders · ‎08-10-2021

Good question. @FredLau

We did not use -j in the script/makes.

We simply timed the compilation of each benchmark in the suite using the exact options I listed in my blog, and then added up the compile times.

Based on what we saw in our testing, I’m comfortable that compilations should in general be faster for everyone. That’s an educated guess not a promise; of course, your mileage will vary. I’m very interested in what you and others see for your builds.

It appears you already knew this - I offer this to help everyone reading this:

There are two ways that a build can use multicore/multiprocessor parallelism (they can both be used together);

-j can used in the script (make) to compile each file in parallel because each compile is a separate job.

With -ipo on, the compiler can also use threading to process each object before calling the linker.

We didn't use either for our simple measurements because we were simply interested in if the compiler itself was faster.

dvjohnston · ‎08-13-2021

Thanks for the update. I have a few questions on what this means for collaboration with the LLVM project:

Will Intel be contributing its changes back to the LLVM project when possible?

Does Intel plan on helping with the implementation of missing C++20 features in the LLVM codebase?

Additionally, does this project plan on keeping ABI compatibility with libc++?

Thanks in advance!

James_Reinders · ‎08-16-2021

Good questions -

Q: Will Intel be contributing its changes back to the LLVM project when possible?

Yes, we have been doing that actively and will continue to do so. It has been going very well. Everyone who contributes to open source knows that not all your ideas get accepted, and we are no exception. We value the interaction that occurs when we contribute because it helps make sure our contributions are as valuable as possible to the whole community even if that means it takes a little longer. We are pleased at the contributions we have been able to make, and look forward to making many more.

Q: Does Intel plan on helping with the implementation of missing C++20 features in the LLVM codebase?

We haven’t contributed in the C++ 20 standard feature space as of yet, but we intend to do so. We have been reviewing where we think we can help, and will engage with the Clang community to figure out the best fit for our efforts. We certainly expect to be contribute C++20 (and future C++) features for clang.

Q: Additionally, does this project plan on keeping ABI compatibility with libc++?

The community has, and continues to maintain ABI compatibility with libc++. The Intel compiler benefits from that since it is based upon Clang/LLVM, and we have no plans to intentionally break that compatibility! In fact, I'm quite certain we would be upset if we did.

Anushka · ‎11-17-2021

Thanks for sharing this useful information for me. i glad with your content

_avi_ · ‎02-17-2022

Thanks