Improving the oneMKL MRG32k3a Random Number Generator for Financial Monte Carlo Applications

Vladimir_P_1234567890 · ‎04-29-2026

By Andrey Fedorov, Ivan Martianov and Vladimir Polin, Intel Corporation

The Importance of Random Numbers

Pseudorandom number generation is a foundational component of modern computational systems, underpinning a broad spectrum of applications including cryptographic primitives, Monte Carlo based simulation, and high-performance scientific computing. Deterministic pseudorandom sequences with well characterized statistical properties are used to initialize and drive predictive and stochastic models across diverse domains, from predictive maintenance and quantitative finance risk evaluation to emergency response planning for earthquakes and tsunamis.

The Intel® oneAPI Math Kernel Library (oneMKL) is a high-performance numerical computing library that provides extensively optimized implementations of fundamental mathematical operations. It is designed to efficiently support computationally demanding workloads, including dense and sparse linear algebra, fast Fourier transforms, and pseudorandom number generation. As a core component of the Intel® oneAPI Base Toolkit, oneMKL contributes to a unified programming framework that enables the development of scalable, high-performance applications across heterogeneous accelerated computing architectures and platforms.

MRG32k3a Random Number Generator in Details

MRG32k3a is a combined Multiple Recursive Generator that merges two third-order MRGs to produce high-quality random numbers with a long period of approximately 2^191. The algorithm maintains two separate recurrence relations with the form

and using prime moduli and . The components are combined through to produce the final uniform output, eliminating individual weaknesses while preserving the mathematical guarantees of linear recursive generators.

The generator has strong theoretical foundations that provide excellent equidistributional properties across up to 45 dimensions and good spectral characteristics when projecting into multiple dimensions. MRG32k3a demonstrates robust statistical performance, successfully passing comprehensive test suites, including the entire

TestU01's BigCrush battery of randomness tests, something that only a few generators can achieve. The linear structure makes its mathematical behavior predictable and maintains very low serial correlation.

MRG32k3a requires four multiplications and two modular reductions for each output, and its regular arithmetic patterns work well with vectorization techniques. The modular arithmetic operations need significant computation, but they follow predictable patterns that can be optimized using SIMD instructions on modern processors. The generator's mathematical design also supports parallel processing across multiple execution contexts.

Given the widespread adoption of MRG32k3a in the financial services industry, we assess its performance using a Monte Carlo based financial benchmark focused on the pricing of American options (“American Monte Carlo”, AMC).

Monte Carlo Method to Price American Options

Monte Carlo simulation is a broadly adopted computational methodology that relies on repeated random sampling to analyze the statistical properties of complex systems, including the evaluation of investment strategy outcomes. In the context of options pricing, the valuation of American style options presents a particular challenge due to the absence of a fixed exercise time; such options may be exercised at any point up to and including the expiration date. To address this flexibility, the AMC approach employs the Longstaff-Schwartz algorithm, also known as the Least Squares Monte Carlo (LSMC) method. This algorithm estimates the optimal exercise strategy through a backward induction procedure over simulated price paths. The workflow can be summarized as follows:

Path Simulation: Discrete asset price paths are generated across all time steps using Monte Carlo simulation. At each path block, random numbers are produced and consumed immediately using an L1‑feed strategy, in which random values are generated into the L1 cache and applied directly to path generation.
Maturity Valuation: At the terminal time step, the payoff at maturity is computed for each simulated path based on the option’s contractual payoff function.
Continuation Value Estimation (Payoff Decision): Moving backward in time, the continuation value is estimated via least‑squares regression against a set of basis functions representing the option’s market state variables. The regression approximates the expected discounted future payoff, which is then compared against the immediate exercise value to determine the optimal exercise decision at each time step.
Final Valuation: After completing the backward induction, the option price is obtained by averaging the discounted payoffs resulting from the optimal exercise decisions across all simulated paths.

The source code for the recipe can be obtained from https://github.com/intel/Financial-Services-Workload-Samples/tree/main/MonteCarloAmericanOptions repository. The recipe benchmark was compiled using the command line from the Tuning Guide for HPC Applications.

Performance Gain from Using the Latest 2026.0.0 Version of oneMKL

The oneMKL 2025.3.0 release introduces a substantially optimized implementation of the MRG32k3a random number generator, delivering improved performance for applications already relying on this engine—without necessitating any code changes. To quantify these advancements, we examine how performance for the AMC benchmark has evolved between versions.

All experiments were conducted on a system with Intel® Xeon® 6980P processors. Since the benchmark employs an L1‑feed strategy - where random values are generated and consumed immediately - and because oneMKL random number engines do not provide multithreaded implementations, the evaluation focuses on single‑thread performance. This configuration is sufficient to expose improvements arising from architectural optimizations, algorithmic refinements, and enhanced vectorization in the latest oneMKL release.

The following chart presents the performance improvement of the AMC benchmark as a function of the number of simulated paths. The results demonstrate a clear improvement in throughput with oneMKL 2026.0.0. Specifically, users can achieve up to a 1.62x performance increase for the full AMC benchmark compared with oneMKL 2025.0.0.

Figure 1. American Monte Carlo scalability results.

Use oneMKL to Speed Up Your Code

Using oneMKL enables a unified programming model through C, C++, FORTRAN and SYCL API interfaces to operate efficiently across diverse hardware architectures.

Although the present study focuses on financial workloads, the techniques and insights presented here are generalized to any application in which random number generation lies on the critical performance path. Domains such as scientific simulation, stochastic modeling, and machine learning can similarly benefit from these optimizations.

Addressing challenges is a regular process for Intel compilers and oneMKL software development teams. To improve Intel software for our customers, we are constantly looking for more efficient algorithms and optimization techniques.

Additional Resources

Intel® oneAPI Math Kernel Library
Pierre L'Ecuyer, (1999) Good Parameters and Implementations for Combined Multiple Recursive Random Number Generators. Operations Research 47(1):159-164. https://pubsonline.informs.org/doi/10.1287/opre.47.1.159
F. A. Longstaff and E. S. Schwartz, "Valuing American Options by Simulation: A Simple Least-Squares Approach," Review of Financial Studies, 14(1), 2001 pp. 113–147.

Notices & Disclaimers

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

*Other names and brands may be claimed as the property of others.

System Configurations:

1-node, 2x Intel® Xeon® 6980P processor on AvenueCity platform with 1536 GB (24 slots/ 64GB/ 8800) total DDR5 memory, ucode 0x10003d0, HT on, Turbo on, Ubuntu 24.04 LTS, 6.8.0-86-generic, 1x Micron_7450_MTFDKBG960TFR 894.3 GB; Intel® oneAPI Math Kernel Library 2026.0.0; Intel® oneAPI Math Kernel Library 2025.0.0. Test by Intel Corporation as of 04/27/26.