- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I am writing to seek assistance with a challenging issue I've encountered while developing a C++ application that utilizes Intel Math Kernel Library (MKL) and OpenMP for parallel processing and random number generation. My development environment is on Windows, using Microsoft Visual Studio, and the application behaves as expected when running with the maximum number of OpenMP threads (8 threads) or a single thread. However, when I adjust the thread count to any number other than 8 or 1, for example, 4 threads, the application fails at runtime with specific errors.
To be clearer, Changing the thread count to other values results in an unhandled exception: "Access violation reading location" and an Intel MKL error: "Parameter 2 was incorrect on entry to vsRngGaussian".
This issue is not observed in a Linux environment, indicating a possible platform-specific behavior. The application employs #pragma omp threadprivate for per-thread MKL VSLStreamStatePtr management and dynamic memory allocations for random number arrays.
Additionally, I've encountered a missing library issue (libiomp5md.lib) as suggested by the Intel MKL Link Line Advisor, which might be affecting the application's performance or the occurrence of runtime errors.
I am Seeking Guidance On:
- Any known compatibility issues between Intel MKL, OpenMP, and Windows that could lead to the described behavior.
- Specific configuration or environmental settings required for stable operation of MKL and OpenMP on Windows, especially regarding dynamic thread count adjustments.
Additional Note: I have successfully used OpenMP in a Windows environment with other codebases, where changing the thread count to any number posed no issues. The problem specifically occurs when integrating MKL for random number generation, which leads me to believe the issue might be closely tied to the MKL and OpenMP interplay in this particular context.
Could you please provide insights or direct me to relevant documentation that might help resolve these issues?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> This issue is not observed in a Linux environment, indicating a possible platform-specific behavior.
<< There is no platform-specific behavior with RNG’s implementation.
wrt “Access violation reading location” – make sense to give us a reproducer of this problem to investigate the behavior.
>> The problem specifically occurs when integrating MKL for random number generation, which leads me to believe the issue might be closely tied to the MKL and OpenMP interplay in this context.
There are no interoperability problems with OpenMP and MKL . At least we could say we are not aware about such.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
//
//----- C++11 random number generation when not using OpenMP -------------
//
#ifndef _OPENMP
#include <random> // C++11 random number generators
#include <functional>
/* some web references
https://www.cplusplus.com/reference/random/
https://stackoverflow.com/questions/14023880/c11-random-numbers-and-stdbind-i
nteract-in-unexpected-way/14023935
https://stackoverflow.com/questions/20671573/c11-stdgenerate-and-stduniform-r
eal-distribution-called-two-times-gives-st
*/
// declare generator and output distributions
std::default_random_engine rng;
std::uniform_real_distribution<float> uniform(0.0f, 1.0f);
std::normal_distribution<float> normal(0.0f, 1.0f);
auto next_uniform = std::bind(std::ref(uniform), std::ref(rng));
auto next_normal = std::bind(std::ref(normal), std::ref(rng));
void rng_initialisation() {
rng.seed(1234);
uniform.reset();
normal.reset();
}
void rng_termination() {
}
//------- MKL/VSL random number generation when using OpenMP -----------
#else
#include <mkl.h>
#include <mkl_vsl.h>
#include <memory.h>
#include <omp.h>
#include <stdio.h>
/* each OpenMP thread has its own VSL RNG and storage */
#define NRV 16384 // number of random variables
VSLStreamStatePtr stream;
float* uniforms, * normals;
int uniforms_count, normals_count;
#pragma omp threadprivate(stream, uniforms,uniforms_count, \
normals, normals_count)
//
// RNG routines
//
void rng_initialisation() {
int tid = omp_get_thread_num();
int status = vslNewStream(&stream, VSL_BRNG_MRG32K3A, 1337);
if (status != VSL_STATUS_OK || stream == NULL) {
printf("Stream initialization failed with status: %d\n", status);
return;
}
long long skip = ((long long)(tid + 1)) << 48;
status = vslSkipAheadStream(stream, skip);
if (status != VSL_STATUS_OK) {
printf("vslSkipAheadStream failed wih status: %d\n", status);
return;
}
uniforms = (float*)malloc(NRV * sizeof(float));
normals = (float*)malloc(NRV * sizeof(float));
if (uniforms == NULL || normals == NULL) {
printf("Memory allocation failed.\n");
return;
}
uniforms_count = 0; // this means there are no random
normals_count = 0; // numbers in the arrays currently
}
void rng_termination() {
vslDeleteStream(&stream);
free(uniforms);
free(normals);
}
float next_uniform() {
if (uniforms_count == 0) {
vsRngUniform(VSL_RNG_METHOD_UNIFORM_STD,
stream, NRV, uniforms, 0.0f, 1.0f);
normals_count = NRV;
}
return normals[--normals_count];
}
inline float next_normal() {
if (normals_count == 0) {
vsRngGaussian(VSL_RNG_METHOD_GAUSSIAN_BOXMULLER2,
stream, NRV, normals, 0.0f, 1.0f);
normals_count = NRV;
}
return normals[--normals_count];
}
#endif
//
// other header files needed for both versions
//
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
//
// main code
//
int main(int argc, char** argv)
{
float T = 1.0f, X0 = 1.0f, mu = 0.05f, sigma = 0.2f, dt;
double sum1 = 0.0, sum2 = 0.0;
int M = 200; /* number of timesteps */
int N = 19600000; /* total number of MC samples */
dt = T / ((float)M);
// initialise generator, with separate storage for each
// thread when compiled for OpenMP
#pragma omp parallel
rng_initialisation();
#ifdef _OPENMP
double wtime = omp_get_wtime();
omp_set_num_threads(8);
#endif
#pragma omp parallel for default(none) shared(T,X0,mu,sigma,dt,M,N) \
reduction(+:sum1,sum2)
for (int n = 0; n < N; n++) {
float X = X0;
for (int m = 0; m < M; m++) {
float delW = sqrtf(dt) * next_normal();
X = X + X * (mu * dt + sigma * delW);
}
sum1 += X;
sum2 += X * X;
}
printf("Exact solution E[X_T] = %g\n", X0 * exp(mu * T));
printf("Monte Carlo estimate = %g +/- %g \n", sum1 / N,
3.0 * sqrt((sum2 / N - (sum1 / N) * (sum1 / N)) / N));
printf("\nReminder: Monte Carlo estimate has discretisation bias\n\n");
float RNGs = ((float)N) * ((float)M);
printf("Random Nums generated = %g\n", RNGs);
#ifdef _OPENMP
wtime = omp_get_wtime() - wtime;
printf("threads = %d\n", omp_get_max_threads());
printf("execution time = %10.4g\n", wtime);
printf("RNG/s = %10.4g\n\n", RNGs / wtime);
#endif
// delete generator and storage
#pragma omp parallel
rng_termination();
}
First of all, thank you for your reply; it has helped narrow down the potential sources of the issue.
The code I am using is this:
//
//----- C++11 random number generation when not using OpenMP -------------
//
#ifndef _OPENMP
#include <random> // C++11 random number generators
#include <functional>
/* some web references
https://www.cplusplus.com/reference/random/
https://stackoverflow.com/questions/14023880/c11-random-numbers-and-stdbind-i
nteract-in-unexpected-way/14023935
https://stackoverflow.com/questions/20671573/c11-stdgenerate-and-stduniform-r
eal-distribution-called-two-times-gives-st
*/
// declare generator and output distributions
std::default_random_engine rng;
std::uniform_real_distribution<float> uniform(0.0f, 1.0f);
std::normal_distribution<float> normal(0.0f, 1.0f);
auto next_uniform = std::bind(std::ref(uniform), std::ref(rng));
auto next_normal = std::bind(std::ref(normal), std::ref(rng));
void rng_initialisation() {
rng.seed(1234);
uniform.reset();
normal.reset();
}
void rng_termination() {
}
//------- MKL/VSL random number generation when using OpenMP -----------
#else
#include <mkl.h>
#include <mkl_vsl.h>
#include <memory.h>
#include <omp.h>
#include <stdio.h>
/* each OpenMP thread has its own VSL RNG and storage */
#define NRV 16384 // number of random variables
VSLStreamStatePtr stream;
float* uniforms, * normals;
int uniforms_count, normals_count;
#pragma omp threadprivate(stream, uniforms,uniforms_count, \
normals, normals_count)
//
// RNG routines
//
void rng_initialisation() {
int tid = omp_get_thread_num();
int status = vslNewStream(&stream, VSL_BRNG_MRG32K3A, 1337);
if (status != VSL_STATUS_OK || stream == NULL) {
printf("Stream initialization failed with status: %d\n", status);
return;
}
long long skip = ((long long)(tid + 1)) << 48;
status = vslSkipAheadStream(stream, skip);
if (status != VSL_STATUS_OK) {
printf("vslSkipAheadStream failed wih status: %d\n", status);
return;
}
uniforms = (float*)malloc(NRV * sizeof(float));
normals = (float*)malloc(NRV * sizeof(float));
if (uniforms == NULL || normals == NULL) {
printf("Memory allocation failed.\n");
return;
}
uniforms_count = 0; // this means there are no random
normals_count = 0; // numbers in the arrays currently
}
void rng_termination() {
vslDeleteStream(&stream);
free(uniforms);
free(normals);
}
float next_uniform() {
if (uniforms_count == 0) {
vsRngUniform(VSL_RNG_METHOD_UNIFORM_STD,
stream, NRV, uniforms, 0.0f, 1.0f);
normals_count = NRV;
}
return normals[--normals_count];
}
inline float next_normal() {
if (normals_count == 0) {
vsRngGaussian(VSL_RNG_METHOD_GAUSSIAN_BOXMULLER2,
stream, NRV, normals, 0.0f, 1.0f);
normals_count = NRV;
}
return normals[--normals_count];
}
#endif
//
// other header files needed for both versions
//
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
//
// main code
//
int main(int argc, char** argv)
{
float T = 1.0f, X0 = 1.0f, mu = 0.05f, sigma = 0.2f, dt;
double sum1 = 0.0, sum2 = 0.0;
int M = 200; /* number of timesteps */
int N = 19600000; /* total number of MC samples */
dt = T / ((float)M);
// initialise generator, with separate storage for each
// thread when compiled for OpenMP
#pragma omp parallel
rng_initialisation();
#ifdef _OPENMP
double wtime = omp_get_wtime();
omp_set_num_threads(8);
#endif
#pragma omp parallel for default(none) shared(T,X0,mu,sigma,dt,M,N) \
reduction(+:sum1,sum2)
for (int n = 0; n < N; n++) {
float X = X0;
for (int m = 0; m < M; m++) {
float delW = sqrtf(dt) * next_normal();
X = X + X * (mu * dt + sigma * delW);
}
sum1 += X;
sum2 += X * X;
}
printf("Exact solution E[X_T] = %g\n", X0 * exp(mu * T));
printf("Monte Carlo estimate = %g +/- %g \n", sum1 / N,
3.0 * sqrt((sum2 / N - (sum1 / N) * (sum1 / N)) / N));
printf("\nReminder: Monte Carlo estimate has discretisation bias\n\n");
float RNGs = ((float)N) * ((float)M);
printf("Random Nums generated = %g\n", RNGs);
#ifdef _OPENMP
wtime = omp_get_wtime() - wtime;
printf("threads = %d\n", omp_get_max_threads());
printf("execution time = %10.4g\n", wtime);
printf("RNG/s = %10.4g\n\n", RNGs / wtime);
#endif
// delete generator and storage
#pragma omp parallel
rng_termination();
}
When I use omp_set_num_threads(8), it works fine and uses all the threads in the expected time. However, when I change the number of the threads, to 4 for instance, it shows me the “Access violation reading location” problem. The debugging points to this part of the code:
inline float next_normal() {
if (normals_count == 0) {
vsRngGaussian(VSL_RNG_METHOD_GAUSSIAN_BOXMULLER2,
stream, NRV, normals, 0.0f, 1.0f);
normals_count = NRV;
}
return normals[--normals_count];
}
Could the issue be related to a missing library, specifically libiomp5md.lib? The Intel MKL Link Line Advisor recommended its use, yet it appears to be missing from the installed package.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok, we will take a look at this example.
meantime, there are two notes here:
1. regard to libiomp5md.lib -- the standalone version of oneMKL contains libiomp*.dll/libs by default. You can check this package from the oneMKL product page following the link: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-download.html
2. the forum thread allows anyone to attach files. it could be much more comfortable to use this option instead posting the whole code explicitly.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The reproducer does work with different number of threads be it 2,3 or 4 and does not show any runtime error.
Could you please let me know the following:
- Which processor you are using
- Which command are you using while compiling the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI,
Could you please provide me with the above-mentioned details.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mahan,
Sorry, I was so busy I could not reply to you right away. Thank you for your help and fast response. Here are the details you requested:
1. Processor: I am using an 11th Gen Intel(R) Core(TM) i3-1125G4 @ 2.00GHz, with 1997 Mhz, 4 Core(s), and 8 Logical Processor(s).
2. Command Used While Compiling: I compile the code using the "Build and Run" feature in Microsoft Visual Studio. This feature automates the compilation process, so I do not use a specific command line instruction manually.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please make sure the following properties are correctly set for the project and the .cpp source file.
- Intel OneAPI 2024 compiler
- C++17 Standard
- oneMKL with LP64
- OpenMP
Please see the attached screen shots for reference
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much! I adjusted those settings and the code worked with all threads.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page