Learn A-to-Z of the SYCL Framework for Data Parallelism
SYCL* is an open-source, industry standards-based framework for the efficient implementation of parallel programming paradigms. Its multi-vendor and multi-architecture support makes it easy to incorporate data parallelism into applications across heterogeneous hardware. Joel John Joseph, in his ‘30 Days of SYCL Programming’ article series, talks about various SYCL concepts and their practical use cases to elevate your parallel programming skills. The comprehensive tutorial series lets you explore SYCL, including its basics and advanced topics with code implementations, advantages over other parallel programming models, and critical comparisons with CUDA*.
Major SYCL Concepts
The article series covers the following crucial topics about parallel programming with SYCL:
- SYCL devices, device selectors, queues, and kernels
- Buffer model and code anatomy
- Unified Shared Memory (USM) and the concept of subgroups
- Buffers and accessors
- Task scheduling, data dependencies, and graphs in SYCL
- Local memory and atomics in SYCL
Practical Applications of SYCL
The articles walk you through some real-world use cases of SYCL programming, such as:
- Image processing and ray tracing
- Scientific computations, including numerical solvers, Monte Carlo simulations, and sparse matrix operations
- Accelerating machine learning algorithms and graph algorithms
- Quantum computing, data visualization, and Virtual Reality (VR) applications
- High Performance Computing (HPC) applications such as Computational Fluid Dynamics (CFD) simulations
- Enhancing Edge AI applications such as smart surveillance and efficient health monitoring
Check out the complete series to explore the above and several other SYCL topics in detail. |
CUDA to SYCL Code Migration
The major advantages of SYCL, such as interoperability, scalability, and support for multi-vendor heterogeneous architectures, give you more freedom to choose an execution platform than the proprietary vendor-locked CUDA solutions. In his day 28 article of the series, Joel provides a performance comparison between CUDA and SYCL, backed by some benchmarking results.
The day 27 article explains the process of manually porting your CUDA code to SYCL through a simple code illustration. However, Intel® DPC++ Compatibility Tool and its open-source counterpart SYCLomatic are the two automated tools that can perform the migration process for you. They automatically migrate the majority (90%-95%)^ of the CUDA source code to C++ with SYCL. You only need to refine the tools’ output (if required) for functional correctness.
Utilize Accelerated SYCL Kernels with Intel® oneAPI DPC++ Library (oneDPL)
The Intel® oneAPI DPC++ Library (oneDPL), an extension of the C++ Standard Template Library (STL), empowers your C++ application with accelerated SYCL kernels across CPUs, GPUs, and FPGAs. It extends the parallel computing libraries such as Parallel STL (PSTL) and Boost.Compute*. It also eases the code migration from CUDA to SYCL by seamlessly integrating with the Intel DPC++ Compatibility Tool. Check out the series’ day 7 and day 8 articles that elaborate on the oneDPL Extension APIs for cross-architecture parallel programming.
Analyze Performance of SYCL Applications with Intel® VTune™ Profiler
Intel® VTune™ Profiler tool helps you analyze, fine-tune, and maximize your application performance. It assists you with various aspects, including but not limited to analyzing hotspots, detecting code anomalies, determining memory consumption and cache misses, and detecting performance issues in I/O-intensive applications. It also provides recommendations on how to fix the performance bottlenecks. The day 6 article of the series describes how to boost a SYCL application performance with Intel VTune Profiler.
What’s Next?
We encourage you to read through the 30 Days of SYCL series and exploit the parallel programming potential of the SYCL framework. Explore some useful resources in the following section to dive deeper into Intel’s tools and libraries to help you achieve data parallelism with SYCL.
Also, check out other AI, HPC, and Rendering tools in Intel’s oneAPI-powered software portfolio.
Additional Resources
- Migrate from CUDA to C++ with SYCL Portal
- Easy CUDA to SYCL Migration
- Add Multi-platform Parallelism to C++ Workloads with SYCL
- oneDPL Empowers Your C++ Application for Cross-Device Parallel Programming with SYCL
- 8 Ways to Analyze, Tune, and Maximize Application Performance with Intel VTune Profiler
- Intel oneAPI Base Toolkit – Delivering the Core Tools and Libraries for Performant, Multiarchitecture Development
About the Author of '30 Days of SYCL'
Joel John Joseph is an Intel Student Ambassador. He is a data analyst, a tech enthusiast, and an Augmented Reality developer pursuing a Master of Computer Applications (MCA) degree from Christ University, Bangalore (India).
^Intel estimates as of March 2023. Based on measurements on a set of 85 HPC benchmarks and samples, with examples like Rodinia, SHOC, and PENNANT. Results may vary.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.