AVAILABLE NOW! Intel® Software Development Tools 2024.2

Devorah_H_Intel · ‎06-26-2024

Software Developer Tools 2024.2

Intel® oneAPI Base Toolkit

Take your application's efficiency to the next level with the Intel® oneAPI DPC++/C++ Compiler's enhanced SYCL* Graph capabilities, now featuring pause/resume support for better control and graph profiling to tune for more performance. Additionally the Intel® oneAPI DPC++/C++ Compiler delivers more SYCL performance on Windows with default context enabled. With the latest release the kernel compiler introduces SPIR-V support, and OpenCL* query support, allowing for greater flexibility and optimization in your compute kernels.
Enhance your debugging experience in Microsoft* Visual Studio* and VS Code* with the Intel® Distribution for GDB*'s new Lane Variable Watch Window, allowing you to monitor and analyze variables more efficiently, leading to quicker problem resolution and enhanced application stability.
Strengthen the security of your applications with expanded Control-flow Enforcement Technology (CET) in Intel® Distribution for GDB* which now includes Shadow Stack capabilities to efficiently debug applications and enhance the reliability of your software.
Use Intel® VTune™ Profiler to gain insights into sub-optimal Intel® oneAPI Collective Communications Library (oneCCL) communication in your applications by finding out the time spent in oneCCL calls and identifying most active oneCCL communication tasks in your application.
Intel® Distribution for Python* added the following features:

The Data Parallel Control Library offers improved productivity with new sorting and summing functions along with updated documentation and bug fixes.
The Data Parallel Extension for NumPy increases productivity with the addition of a new family of cumulative functions and improved linear algebra functions.

Intel® oneAPI DPC++ Library (oneDPL) adds new C++ Standard Template Library (STL) copy_if and inclusive_scan algorithm extensions for developers to write parallel programs for multiarchitecture devices. The performance of many existing algorithms* are also improved on Intel and other vendors' GPUs.
Apps run faster on 5th Gen Intel® Xeon® Processors with Intel® oneAPI Threading Building Blocks (oneTBB) optimized thread synchronization to reduce startup latency
Apps run faster using oneTBB parallel_reduce improved data movement to avoid extra copying
Intel® oneAPI Math Kernel Library (oneMKL) 2024.2 introduces enhanced performance of 2D and 3D real and complex FFT targeted for Intel® Data Center GPU Max Series.
To extend sparsity functions across Intel® oneAPI Data Analytics Library (oneDAL) algorithms, this release adds DPC++ sparse gemm and gemv primitives and includes sparsity support for the logloss function primitive.
Intel® oneAPI Collective Communications Library (oneCCL) introduces multiple enhancements that improve the utilization of system resources such as memory and I/O, unlocking even better performance.
Intel® oneAPI Deep Neural Network Library (oneDNN) 2024.2 introduces:

Enhanced Performance for next generation client platforms: Experience faster and more efficient processing with broad production quality optimizations, maximizing the performance potential of upcoming AI enhanced Intel client processors.
Optimized Performance for next generation server platforms: Future-proof your systems with enhanced production quality optimizations, ensuring top-tier performance for upcoming Intel Xeon Scalable processors.
Improved Large Language Model Performance: Boost the efficiency of your AI workloads with support for int8 and int4 weight decompression in matmul, accelerating large language models with compressed weights for faster insights and results.

Intel® Integrated Performance Primitives added the following features:

Improved compression ratio and throughput with new optimization patch for zlib 1.3.1
Accelerated image processing capabilities on select color conversion functions using Intel® AVX-512

Intel® Integrated Performance Primitives Cryptography added the following features:

Enhanced data protection in post-quantum era, with new Intel-optimized LMS post-quantum crypto algorithm
Advanced AES-GCM performance on 5th Gen Intel® Xeon® Scalable Processors and Intel® Core™ Ultra processors, with simplified integration with new code sample

Save time validating migrated SYCL is equivalent to original code using Intel® DPC++ Compatibility Tool to auto compare kernel run logs and report differences
Easily migrate to SYCL with Intel® DPC++ Compatibility Tool migrating 126 more commonly used CUDA APIs

Intel® HPC Toolkit

Our latest OpenMP* enhancements include support for omp_target_memset() and omp_target_memset_async(), enabling developers to efficiently initialize large data on target devices, reducing overhead and accelerating parallel computing tasks. Additionally the compiler emits detailed remarks about OpenMP loop collapsing under the -qopt-report option. Gain valuable insights into your loop transformations and make informed decisions to fine-tune your application's performance.
Stay at the forefront of parallel programming with our ongoing conformance enhancements for the latest OpenMP standards, including 5.x and the forthcoming 6.0. With this Intel® Fortran Compiler release you can now specify OpenMP 5.1 THREAD_LIMIT on TEAMS and TARGET constructs to better manage thread usage, and OpenMP* 5.2 enhancements like COPYPRIVATE and NOWAIT on construct beginnings, as well as an updated LINEAR clause syntax for more precise control. And with OpenMP TR12, the LOOP directive is now applicable to DO CONCURRENT loops, paving the way for more powerful loop optimizations.
The Intel® Fortran Compiler adds the -fstrict-overflow and Qstrict-overflow[-] options to instruct the Fortran compiler to optimize under the assumption that integer operations won't overflow. For applications that rely on integer overflow behavior, the -fnostrict-overflow option ensures correct functionality.
Newly added OpenMP runtime library extensions provide a robust set of memory management extensions, including functions for precise host pointer registration, targeted memory allocations, and device-specific optimizations. With these powerful extensions, you can push the boundaries of performance and efficiency in your high-performance computing applications.
Developers using Intel® MPI Library can now realize faster application performance through optimizations for GPU-aware broadcasts, RMA peer to peer device-initiated communications, intranode thread-splits, and Infiniband* tuning for 5th Gen Intel® Xeon® Scalable Processors
On machines with multiple Network Interface Cards (NICs), developers using the Intel® MPI Library now have the option of increased application performance control by pinning specific threads to individual NICs

Download the toolkit here.