Monte Carlo Simulation Code: Migrating from Multi-GPU CUDA* to C++ with SYCL*

Nikita_Shiledarbaxi · ‎10-23-2023

Author: Nikita Sanjay Shiledarbaxi

Easy Code Migration with SYCLomatic and Intel® DPC++ Compatibility Tool

The SYCL* programming framework provides an open, standard, multiarchitecture, and multi-vendor alternative to the proprietary, vendor-locked CUDA* solutions. This blog will discuss an interesting Monte Carlo Multi-GPU code sample that migrates NVIDIA*’s CUDA code sample to SYCL using the SYCLomatic tool. Similar migration can also be performed with the Intel® DPC++ Compatibility Tool from the Intel® oneAPI Base Toolkit (Base Kit). The sample estimates a fair call price [1] for a given set of European call options using Monte Carlo simulation and compares it with that computed analytically using the Black-Scholes formula.

Migrating your CUDA code to SYCL lets you take advantage of data-parallel programming, code reusability, and performance portability. The Intel DPC++ Compatibility Tool and its open source counterpart SYCLomatic tool automatically migrate your CUDA code to C++ with SYCL.

Call price predictions help active traders to have substantial gains if the stock price rises. It helps individual banks and companies in the quantitative finance sector to mitigate the risks of fluctuations in currency value. Being able to buy

Before diving deeper into the code sample, let us look at the migration tools and Black-Scholes formula.

Sneak Peek at SYCLomatic and Intel® DPC++ Compatibility Tool

SYCLomatic and Intel DPC++ Compatibility Tool are one-time migration utilities for migrating existing CUDA code to SYCL. They migrate the majority of the code sections, including CUDA language kernels and library API calls, thereby reducing the time and effort required in manual code migration.

They perform the overall migration process in 5 comprehensive steps, as depicted in Fig. 1 below:

Fig.1: CUDA to SYCL code migration workflow

Black-Scholes Formula: An Overview

Black-Scholes formula is a differential equation used to estimate the theoretical price of a European option. Considering the impact of time and other risk factors in a financial market, it computes the call price (C) as:

Let us now go into the details of the code sample.

CUDA Source Code At A Glance

The CUDA source code follows a sequence of steps in each iteration to compute the call price using the Monte Carlo method:

Generate a random number based on a probability distribution.
Use the random number, other factors of volatility, and the expiration date of the stock to calculate the stock price on that date.
Compute the call price using the stock price at expiration.

Initially, the Monte Carlo method generates multiple uniformly distributed random samples to compute the intermediate results. Those results are all averaged to get the estimated call price consistent with that calculated analytically using the Black-Scholes formula. Each such random scenario is assigned to a single GPU thread. Thousands of such GPU threads (controlled by a single CPU thread) are then executed in parallel, with significant savings in computational power and overall execution time.

CUDA to SYCL Migration

The code sample demonstrates how the SYCLomatic tool automatically migrates the CUDA Random Number Generator (cuRAND) feature. The migration tool translates the calls to cuRAND function APIs to equivalent Intel® oneAPI Math Kernel Library (oneMKL) random number generation (RNG) domain function API calls.

The MonteCarloOneBlockPerOperation() is the main computation kernel that computes the integral over all the random scenarios using a single thread block per call option. Notice how the random number generation step in the method of CUDA code uses the curand_normal() function while computing the call value and confidence of the option:

for (int i = iSum; i < pathN; i += SUM_N) {
        real r = curand_normal(&localState);
        real callValue = endCallValue(S, X, r, MuByT, VBySqrtT);
        sumCall.Expected += callValue;
        sumCall.Confidence += callValue * callValue;
      }

SYCLomatic replaces the call to CUDA’s curand_normal() with a call to oneMKL’s generate() function:

for (int i = iSum; i < pathN; i += SUM_N) {
        real r = localState.generate<oneapi::mkl::rng::device::gaussian<float>, 1>();
        real callValue = endCallValue(S, X, r, MuByT, VBySqrtT);
        sumCall.Expected += callValue;
        sumCall.Confidence += callValue * callValue;
      }

Check out the article Random Number Generation with cuRAND and oneMKL.

Steps for Migrating CUDA Source Code to SYCL

The code migration requires you to clone the CUDA code samples’ GitHub repository, followed by switching to the Monte Carlo Multi-GPU sample directory. Then, generate a JSON-formatted compilation database using the intercept-build2 script provided by the migration tool. Pass the JSON file to the SYCLomatic tool, which will then perform the automatic migration and write the results to an output file.

Check out the GitHub Repository and sample code readme file for detailed steps, from setting up environment variables to building and running the code sample on Linux* and Windows* operating systems.

Example SYCL-Migrated Output

The code sample has two versions in the form of two different folders in the GitHub repository:

01_dpct_output contains the output of the SYCLomatic tool with some unmigrated code that needs to be handled manually for functional correctness.
02_sycl_migrated contains SYCL code manually migrated from CUDA code.

Here’s an example output of 02_sycl_migrated:

./a.out Starting...

MonteCarloMultiGPU

==================

Parallelization method = streamed

Problem scaling = weak

Number of GPUs = 1

Total number of options = 12

Number of paths = 262144

main(): generating input data...

main(): starting 1 host threads...

main(): GPU statistics streamed

GPU Device #0: Intel(R) UHD Graphics P630 [0x3e96]

Options : 12

Simulation paths: 262144

Total time (ms.): 3.139000

Note: This is elapsed time for all to compute.

Options per sec.: 3822.873601

main(): comparing Monte Carlo and Black-Scholes results...

Shutting down...

Test Summary...

L1 norm : 6.504269E-04

Average reserve: 2.790815

What’s Next?

We encourage you to implement the Monte Carlo Multi-GPU code sample today and see how Intel’s automated migration tools can help you with easy yet efficient CUDA to SYCL migration. Learn more about SYCLomatic and Intel® DPC++ Compatibility Tool and harness the power of parallel programming with SYCL!

Also, explore other AI, HPC, and Rendering tools in Intel’s oneAPI-powered software portfolio.

Additional Resources

Get The Software

You can get the Intel DPC++ Compatibility Tool included as a part of the Intel oneAPI Base Toolkit. The SYCLomatic project is available on GitHub.

[1] "Call option" is a contract between a buyer and a seller of an asset (such as a stock) that grants its buyer the right to sell the asset at a fixed price (known as the ‘call price’) before its expiration date.

[2] The migration tool provides an intercept-build script that keeps track of all the build process details. It writes the compilation options, macro definitions, and include-paths to a JSON-formatted compilation database. The database provides the exact build settings and eases the understanding of dependencies for the migration tool. The intercept-build utility is CLANG-based and operates on make and cmake build environments.