- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a/ In the function with the main loop that I'm attempting to thread there are a number of stack based variables. My understanding is that the default for these variables is OMP shared. The variables needto be OMP private in the threads spawned to avoid race conditions. I've tried using the private clause in the omp parallel for clause
#pragma omp for private(xVar)
but I get an error from ICC
(0): internal error: 0_12032
which is on thecryptic side.
b/ I thought that I might be able to use the parallel construct before the private variables were declared to generate individual copies of the OMP private variables for each thread:
#pragma omp parallel
{
var definitions
...
parallel for
{
loop
}
} // end of parallel region
This would be inefficient in that any initialisation involving computation would be repeated for each variable rather than done for one instance and this instance copied to each thread's OMP private instance but I could wear this to get it going. The parallel section is entered, 8 threads are created (number of cores on my box) some initialisation done but then theprogram crashes.
Is this an OK way to get OMP thread private variable instances?
c/A library used by the code, Blitz, isn't thread safe. It's possible that this is interfering with the approach in/b. To get Blitz running threadsafe requires POSIX threads. Windows doesn't support native POSIX threads. There is an Open SourceWindows pthread implementation http://sourceware.org/pthreads-win32/. Getting Blitz threadsafe under ICC on Windows using the Open Source Windows pthread implementation would no doubt be an interesting little project but not one I want to take on at the moment. Anyone have any experience at this? I noticed that with the Parallel Composer install there's a pthread.h file used bythe tachyon example. The pthread.h file provides someemulation of a few POSIX threads bits and pieces. I could go down this route by looking for the POSIX dependencies in Blitz and implementingemulations but it's outside the scope of what I'm trying to do
Regards
David
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a) It would be helpful to know the compiler version you are using, and try with latest version if the internal error comes.
I hope its a compiler error and not runtime??
It will be useful to give the structure of routine / loop you are parallelizing. If possible, give testcase.
b) Default is shared.. I think that would not work with private clause, as I believe private variables do not persist across parallel regions. If you use Threadprivate clause for the stack variables needed for each thread, the variable will persist in for(...) , else it will have garbage value in the 2nd parallel region, I think so.. Yes, the computation will be done for each thread, but I suppose the extra computations will just be equal to num_threads - 1 ??
Also , I think you can give threadprivate clause a try, and see if the program crashes.. which you may specify after declaration, and before definition.. i.e, before 1st parallel region. probably, the runtime crash may be due to variables not able to retain values?
#pragma omp threadprivate(a, x)
That is not to say that , the a) should not work fine..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Internal error is a bug, so, if you can submit a case which shows it with the current compiler release, that should be valuable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
omp parallel
{
vars
omp for
for loop
{
}
}// end of parallel region
My thought was that private copies of the vars following the initial parallel pragma would be allocated to the threads created when the initial parallel regions was entered. These threads would in turn be allocated to iterations of the parallelised loop. So the threads would executethe parallelised loop iterations already prepared with private copies of the variables concerned.
But if OMP regards these as separate regions I guess this wouldn't work.
Thanks forthe help
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry I thought you are defining in 1 region, and using in another which is not required.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not see internal compiler error.
$ icc -V
Intel C Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100806 Package ID: l_cproc_p_11.1.073
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
$ uname -a
Linux dpd22 2.6.18-52.el5 #1 SMP Wed Sep 26 15:26:44 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
$ cat tst.cpp
#include
#define N 10000
int main()
{
int a
int x, y;
int sum;
int i;
for( i = 0; i < N; ++i)
a = i;
#pragma omp parallel for private(x, y) reduction(+:sum)
for (i = 0 ; i < N; ++i){
sum = sum + a;
x +=i;
y +=i*i;
}
std::cout<<"Hi sum = "<< sum << std::endl;
return 0;
}
$ icc -openmp tst.cpp
$ ./a.out
Hi sum = 49995000
It would be nice if you canshare your test case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
omp parallel
{
[n threads running here]
vars
omp for
for loop
{
[same n threads running different slices of for loop]
}
[same n threads running here
}// end of parallel region
[same thread as first thread running above]
There is only one parallel region
If you should change "omp for" to "omp parallel for" and if nested regions enabled, then the inner loop will create n teams of m threads, each team slicing the entire array.
IOW as to if these are seperate parallel regions, it depends on if the keyword "parallel" is contained on the omp statement.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
omp parallel
{
[n threads running here]
vars
omp for
for loop
{
[same n threads running different slices of for loop]
}
[same n threads running here
}// end of parallel region
[same thread as first thread running above]
There is only one parallel region
If you should change "omp for" to "omp parallel for" and if nested regions enabled, then the inner loop will create n teams of m threads, each team slicing the entire array.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's what I said "...if nested regions enabled..."
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
threadprivate decl...
pragma omp parallel
{
----definitions----
}
pragma omp parallel for
for(...)
{
----use and compute----
}
In above case as there is no nesting parallelism, so no n*m threads will be created, so it will be just n threads. Also, I don't think any barrier is needed.
Though not sure about any extra overheads or perf impact, please let know whether the above solution is also right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** however ***
It is the progammers responsibility to account for runtime differences that may occure outside the program test environment.
In particular
Doall the threads run through the first region then the second region?
If not, do the same threads run through both regions?
The test environment may not be using nested parallel regions (always see the same threads passing through both regions).
Whereas the production environment may be using nested parallel regions and the threads passing through the first region are not the same threads (or same number of threads) passing through the second region(s).
This should not scare you away from using thread local storage. It should only caution you to pay attention to the details.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page