- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am a newbie to learn openmp.
I am now facing a strange problem. My code in the following, which can be carried out correctly with my 2-core laptop, however, it will crash on my 24-core workstation, and the operating system (win vista) says: code 0xc0000005.
Would anyone like to help me ? thank you !
do{
#pragma omp parallel for private(v)
for (int i=0; i
double* vLast = new double[nvG]; //nvG=10000
double delta = 0.30;
for (int m=0; m
vLast
v
}
delete[] vLast;
}
} while (!convergenced)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In your example, a is shared because it is declared outside the parallel for region and not with a private clause. b is private because any variable declaration within the parallel for region is implicitly private.
Just to make the example more complete:
double *a = new double[10];
double *b = new double[10];
double *c = new double[10];
#pragma omp parallel private(b) firstprivate(c)
{
double *d = new double[10];
}
We get:
- a is shared, so is the memory behind.
- b is private, but the pointer is dangling, as the privatized b is left uninitialized.
- c is made private and the pointer to the array is passed along. The memory pointed to is shared amongst all threads.
- d is automatically private by scoping rules. As each executes the new operator, each threads has a private memory referenced by d.
-michael
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am a newbie to learn openmp.
I am now facing a strange problem. My code in the following, which can be carried out correctly with my 2-core laptop, however, it will crash on my 24-core workstation, and the operating system (win vista) says: code 0xc0000005.
Would anyone like to help me ? thank you !
do{
#pragma omp parallel for private(v)
for (int i=0; i
double* vLast = new double[nvG]; //nvG=10000
double delta = 0.30;
for (int m=0; m
vLast
v
}
delete[] vLast;
}
} while (!convergenced)
Hi!
At first glance the code you've posted looks pretty OK to me. It would help to find the problem if you could strip down your program to the above loop with a little bit of skeleton around to have a compilable code that still shows the problem. It would also help to know how you compiled the code (compiler version and switches used, optimization levels, etc.).
One little remark on the loop: You should seperate the parallel region ("parallel") and the work-sharing construct ("for") by using this structure for your code:
#pragma omp parallel
do {
$pragma omp for
for (...) ( {...}
}
With the new code, the parallel region is created only once (which is a rather expensive operation). In your original code, the parallel region is created and torn down a cazillion of time until convergence is reached. But that's an optimization you should perform, after you've found the bug that triggered your post.
Cheers,
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
do{
#pragma omp parallel for private(v)
for (int i=0; i
double* vLast = new double[nvG]; //nvG=10000
double delta = 0.30;
for (int m=0; m
vLast
v
}
delete[] vLast;
}
} while (!convergenced)
If this example is a true sample of your problem, one would have to be concerned that strange code could result in "a strange problem."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi tim18,
You assume that v is declared to be an array. If it's a pointer, only the pointer would be privatized while the array reference still is shared. Without access to the declaration of v we can't be sure about that. One other thing: If v would be a true array that is privatized it would consume 3000*10000 elements of "something" which would already cause a stack overflow with only a few thread. Won't it?
Cheers,
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi tim18,
You assume that v is declared to be an array. If it's a pointer, only the pointer would be privatized while the array reference still is shared. Without access to the declaration of v we can't be sure about that. One other thing: If v would be a true array that is privatized it would consume 3000*10000 elements of "something" which would already cause a stack overflow with only a few thread. Won't it?
Cheers,
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am a newbie to learn openmp.
I am now facing a strange problem. My code in the following, which can be carried out correctly with my 2-core laptop, however, it will crash on my 24-core workstation, and the operating system (win vista) says: code 0xc0000005.
Would anyone like to help me ? thank you !
do{
#pragma omp parallel for private(v)
for (int i=0; i
double* vLast = new double[nvG]; //nvG=10000
double delta = 0.30;
for (int m=0; m
vLast
v
}
delete[] vLast;
}
} while (!convergenced)
Thank you all for your reply.
I think I shoud post more detailed code segments of my problem.
v is a two-dimensional dynamic array, which is allocated before the parallel region.
v
double** v = new double*
for (int m=0; m
v
for (int n=0; n
v
}
}
const int maxIterations = 500;
double maxError = 0.0;
int iterationCount = 0;
do
{
++iterationCount;
maxError = 0.0;
#pragam omp parallel for
for (int m=0; m
double* vLast = new double
double delta = 0.30;
for (int n=0; n
// keep v
vLast
// call a function to update v
...........
// update v
v
}
double tm = ComputeError(vLast, v
if (tm > maxError)
maxError = tm;
delete[] vLast;
}
}
while (maxError>TOL && iterationCount
for (int m=0; m
delete[] v;
============================================================
I wonder that vLast is shared or private ? if it is shared, should it be allocated in a critical region ? something like
double* vLast = NULL;
#pragam omp parallel for private(vLast)
for (int m=0; m
#pragma omp critical
vLast = new double
.............................................................
furthermore, I have tested that when I use the following code:
#pragma omp parallel for num_thread(24)
on my laptop, the progrmam can be implemented correctly. so I am really confused about the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all for your reply.
I think I shoud post more detailed code segments of my problem.
v is a two-dimensional dynamic array, which is allocated before the parallel region.
v
[... code ... ]
============================================================
I wonder that vLast is shared or private ? if it is shared, should it be allocated in a critical region ? something like
double* vLast = NULL;
#pragam omp parallel for private(vLast)
for (int m=0; m
#pragma omp critical
vLast = new double
.............................................................
furthermore, I have tested that when I use the following code:
#pragma omp parallel for num_thread(24)
on my laptop, the progrmam can be implemented correctly. so I am really confused about the problem.
Hi!
So v is a pointer that you make private. As private only creates an uninitialized thread-private incarnation of v, the your pointer your data structure is gone when the thread starts executing. You should use firstprivate instead to pass along the value of the pointer from outside of the parallel region. That would make your data structure accessible by all threads through a private copy of the pointer to the data. I guess that this is what you want.
For vLast, I don't know what exactly you need.
If you need a private array vLast[0..N] for each thread, then your code is not quite right. As vLast is shared by default, each thread allocates an array and concurrently overwrites the vLast variable. You should declare vLast as private (as you did in your last code snippet) or hoist the declaration of the vLast pointer into the parallel region. Then you don't need the critical anymore as call to the new operator should be thread-safe.
If you need only a single instance of the vLast[0..N] array that is shared amongst the workers, you'd better allocate it once before going into parallel and only pass a firstprivate pointer into the parallel region (as with v).
Cheers,
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
So v is a pointer that you make private. As private only creates an uninitialized thread-private incarnation of v, the your pointer your data structure is gone when the thread starts executing. You should use firstprivate instead to pass along the value of the pointer from outside of the parallel region. That would make your data structure accessible by all threads through a private copy of the pointer to the data. I guess that this is what you want.
For vLast, I don't know what exactly you need.
If you need a private array vLast[0..N] for each thread, then your code is not quite right. As vLast is shared by default, each thread allocates an array and concurrently overwrites the vLast variable. You should declare vLast as private (as you did in your last code snippet) or hoist the declaration of the vLast pointer into the parallel region. Then you don't need the critical anymore as call to the new operator should be thread-safe.
If you need only a single instance of the vLast[0..N] array that is shared amongst the workers, you'd better allocate it once before going into parallel and only pass a firstprivate pointer into the parallel region (as with v).
Cheers,
-michael
Many thanks for your help.
I am not clear about the dynamic memory.
Some books say that the variables declared by new (malloc) is shared by default. Is it correct ? I am confused about this.For example, in the following code segment, which property of the variable a and b should be ?
double* a = new double [100]; // a is shared ? pivate ?
#pragma omp parallel for
{
double* b = new double [100]; // b is shared ? private ?
}
In fact, what I need is that v is shared and vLast is private.
Thank you !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks for your help.
I am not clear about the dynamic memory.
Some books say that the variables declared by new (malloc) is shared by default. Is it correct ? I am confused about this.For example, in the following code segment, which property of the variable a and b should be ?
double* a = new double [100]; // a is shared ? pivate ?
#pragma omp parallel for
{
double* b = new double [100]; // b is shared ? private ?
}
In fact, what I need is that v is shared and vLast is private.
Thank you !
In your example, a is shared because it is declared outside the parallel for region and not with a private clause. b is private because any variable declaration within the parallel for region is implicitly private.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In your example, a is shared because it is declared outside the parallel for region and not with a private clause. b is private because any variable declaration within the parallel for region is implicitly private.
Just to make the example more complete:
double *a = new double[10];
double *b = new double[10];
double *c = new double[10];
#pragma omp parallel private(b) firstprivate(c)
{
double *d = new double[10];
}
We get:
- a is shared, so is the memory behind.
- b is private, but the pointer is dangling, as the privatized b is left uninitialized.
- c is made private and the pointer to the array is passed along. The memory pointed to is shared amongst all threads.
- d is automatically private by scoping rules. As each executes the new operator, each threads has a private memory referenced by d.
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just to make the example more complete:
double *a = new double[10];
double *b = new double[10];
double *c = new double[10];
#pragma omp parallel private(b) firstprivate(c)
{
double *d = new double[10];
}
We get:
- a is shared, so is the memory behind.
- b is private, but the pointer is dangling, as the privatized b is left uninitialized.
- c is made private and the pointer to the array is passed along. The memory pointed to is shared amongst all threads.
- d is automatically private by scoping rules. As each executes the new operator, each threads has a private memory referenced by d.
-michael
Accroding to your explanation, I am sure that my codeshould becorrect. But why it crashes on the 24-core platform ? My program can run correctly on my 2-core laptop even the number of threads was set to be 24 to simulate the 24-core workstation.
The hardward/software information of the workstation is:4 Intel Xeon 7400processors (totally 24-core); 128GB RM; 3TB HDD; Windows 2008 enterprise; intel parallerl studio 1.0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Accroding to your explanation, I am sure that my codeshould becorrect. But why it crashes on the 24-core platform ? My program can run correctly on my 2-core laptop even the number of threads was set to be 24 to simulate the 24-core workstation.
The hardward/software information of the workstation is:4 Intel Xeon 7400processors (totally 24-core); 128GB RM; 3TB HDD; Windows 2008 enterprise; intel parallerl studio 1.0.
We may be looking at the problem the wrong way. Perhaps it is an OS problem, rather than a code problem.
What OS do you have on your laptop? Code 0xc0000005 in Vista means access violation. Try using Run as administrator.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Accroding to your explanation, I am sure that my codeshould becorrect. But why it crashes on the 24-core platform ? My program can run correctly on my 2-core laptop even the number of threads was set to be 24 to simulate the 24-core workstation.
Hi,
I have checked your code on my machine with different thread counts and did not receive any error. Did you try debugging your code with a debugger to see where the error comes from? Maybe that can give an indication on where the problem actually is.
Cheers,
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The hardward/software information of the workstation is:4 Intel Xeon 7400processors (totally 24-core); 128GB RM; 3TB HDD; Windows 2008 enterprise; intel parallerl studio 1.0.
I think Michael is on the right track. Are you declaring v private? The first code example shows a private v but the second example doesn't specify. For the size of that array and the purposes you're putting it to use, the pointer should be declared and initialized outside the parallel region (as you're doing) and then shared across the threads; that is, don't declare it private.
It's also unfortunate that this code has to declare and destroy vLast so many times. It shouldn't cause a fault but must be adding a lot of overhead to the algorithm to repeatedly allocate and destroy those arrays. Maybe you could do something like this:
double *vLast = (double *)0;
#pragma omp parallel firstprivate(vLast) {
#pragma omp for
for (int m=0; m
if (! vLast) vLast = new double
// All the loop over n and error computations
}
if (vLast) delete[] vLast;
}
I think this should work. Declaring and initializing vLast outside the parallel region and then declaring it firstprivate means the copies available initially should be initialized to NULL. The cost of an if in the middle loop should be a lot less than the repeated allocate/deallocate of the original example. Separating the parallel from the for provides a place where the use of the vLast components is complete but the private pointers still exist. The dynamic memory can be cleanly released once per thread,... per outer loop interation (500 times in the second example?). The parallel region construct might be pulled outside the outer loop but would take more OMP hair to properly handle the outer loop exit condition. It should be the best performing code by eliminating all those extraneous memory allocation operations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Du-oh! I was havingtrouble finding an exit strategy and when I found it, I didn't realize it makes the whole thing simpler;it came to me as I was bumbling about the house that this is all you probably need:
{
double *vLast = new double
#pragma omp for
for (int m=0; m
for (int n=0; n
.........
}
}
delete[] vLast;
}
Inside the scope of the parallel block, each HW thread gets a private copy of *vLast, to which it attaches a dynamic array. Threads divide the range of m and each has a private vLast array to reuse over the interval(s) it gets in the omp for.
Oh, but there is another wrinkle I hadn't considered earlier:
if (tm> maxError)
maxError = tm;
Oops. tm is declared inside the parallel section so is private, but maxError needs to be shared in order to find the largest over the range of m. Unfortunately, though < and > are associative operations, they are not supported in OpenMP's reduction clause so you'll need to do a manual reduction. It might look something like this:
{
double *vLast = new double
#pragma omp for
for (int m=0; m
for (int n=0; n
.........
}
double tm = ComputeError(vLast, v
#pragma omp critical reduce_maxError
{
if (tm > maxError)
maxError = tm;
}
}
delete[] vLast;
}
Because maxError is shared, access to it from multiple threads needs to be guarded. This is a serial region within an otherwise parallel loop so you could reduce the overhead even more by doing a real reduction:
#pragma omp parallel
{
double *vLast = new double
doublelocalMaxError = 0.0;
#pragma omp for
for (int m=0; m
for (int n=0; n
.........
}
double tm = ComputeError(vLast, v
if (tm > localMaxError)
localMaxError = tm;
}
#pragma omp critical reduce_maxError
{
if (localMaxError > maxError)
maxError = localMaxError;
}
delete[] vLast;
}
Each thread in the team would keep a localMaxError that would collect the error over the range of m handled by each thread and then through the named critical section (the name makes this critical section private) the partial reductions of maxError would be accumulated together.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page