- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recently done a test in a simple OpenMP based application andOpenMPcouldn't create more than 64 threads.
Here is a code of the test:
#include <omp.h>
void main( void )
{
int iShowNumOfThreads = 1;
omp_set_num_threads( 1024 );
#pragma omp parallel num_threads( 1024 )
{
if( iShowNumOfThreads == 1 )
{
iShowNumOfThreads = 0;
printf( "Number of threads created: %ld\\n", ( int )omp_get_num_threads() );
}
for( int i = 0; i < 16777216; i++ )
{
double dA = ( 2 * 4 * 8 * 16 );
}
}
printf( "Done\\n" );
}
How could I create as many as possible OpenMP threads? For example, more than 32,768?
Best regards,
Sergey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'd like to provide some additional information and a new question ( please see 3. ):
1. I alsoset an environment variable 'OMP_NUM_THREADS' to 1024 and it doesn't change the limitation.
A call to 'omp_get_max_threads' OpenMP function, like:
...
printf( "Max Number of threads: %ld\n", omp_get_max_threads() );
...
returns 64.
2. Only 58 threads are reported as exited. So, by some reason 6 threads are lost! Here is
a Visual Studio 2005 output:
...
The thread 'Win32 Thread' (0x204) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x1b0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc14) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x910) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x814) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x874) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x9b0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x980) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc4) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xbc0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xeec) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x8f4) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x30c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc30) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xca0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb44) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x778) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x9f8) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa24) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x748) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xf70) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc88) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x678) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb94) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc5c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa70) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc84) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa98) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc74) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x4e0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xaa8) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa44) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xbe8) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa8c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x63c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xca4) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xcc0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x518) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc7c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x98c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb34) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa60) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x96c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xf90) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xbfc) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xbe0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xe14) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x5cc) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xcf8) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xf10) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x34c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb78) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xdc) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb1c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb98) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc94) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x6fc) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x6d8) has exited with code 0 (0x0).
...
3. I wonder if OpenMP version 2.0 ( March 2002 )has some limitations and doesn't allow to create more than 64 threads?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey,
corrected example
And its output for Composer XE 2011 update 9
Which compiler did you use?
I wonder ifOpenMPversion2.0( March 2002 )has some limitations and doesn't allow to create more than64threads?
--Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Which compiler did you use?
The test was done with Visual Studio 2005.
>>...It is up to implementation.
Did Microsoft's implementation set some limits?
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The OpenMP specification does not set any limits on the number of threads, except for what the interface to the runtime routines accept as input values. So, you should be save about this.
However, the implementation is free to have internal limits (e.g. 64 threads max). I do not know if the MS implementation of OpenMP actually enforces a limit internally. You did not write about the machine you're working on. Is the machine a WSM-EX box with more than 64 cores (including the Hyper-Threading cores)? If yes, the limit might come from the fact the Windows processor groups are limited to 64 cores and that you need a new API to distribute threads across different processor groups. Alas, this has to be done from the OpenMP runtime and it might be the case that the MS implementations limits the thread number to the size of the processor group.
If you use the Intel OpenMP runtime you should not see any restrictions on the number of threads that you can create.
Cheers,
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did Microsoft's implementation set some limits?
It is better to ask Visual Studio team.
But using VS2010 I've got the same 64 threads.
--Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, this is a "hack" and I'll provide technical details later.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A C/C++ code with an OpenMP directive:
...
#pragma omp parallel num_threads( 1024 ) // 64 us a default value andwill be used instead
{
...
is compiled to several initialization calls in assembler language:
...
72881482 call _vcomp::min
...
(1)7288148C push edx // A number of Win32 threads to create ( 64 )isin EDX register
7288148D mov ecx,dword ptr [ebp-4]
72881490 call _vcomp::PerThreadData::SetNextNumThreads (...) // Initializes some internal structures but Win32 threads are still not created
72881495 mov esp,ebp
72881497 pop ebp
...
004B72D6 call @ILT+11795( __vcomp_fork ) (...) // Creates Win32 threads and starts processing
...
At (1) a register EDX is already set with a maximum number of threads and this is 64. In the debugger
I changed the value of the EDX register to 1,024 ( 0x400 ). Then, a call to internal OpenMP function '__vcomp_fork'
creates 1,024 Win32 threads and starts the processing.
Here is a screenshot of the Windows Task Manager:
Note: 1,025 = 1 ( Win32 parent process ) + 1,024 ( Win32 threads created by OpenMP API)
Here is some information on an OpenMP DLL loaded by the test application:
...
Loaded 'C:\WINDOWS\WinSxS\x86_Microsoft.VC80.DebugOpenMP_1fc8b3b9a1e18e3b_8.0.50727.4053_x-ww_3f6e27c4\vcompd.dll', Symbols loaded (...).
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting Vladimir Polin (Intel)
Yes, that test was done on a computer with one CPU and the purpose of the test is simple - astress testing
of OpenMP library and evaluation of memory requirements forOpenMPapplication with a number of
threads greater than 1,024.
My current result is as follows: Microsoft's implemented OpenMP library v2.0doesn't allow to create more
than 1,977 threads. The application crashes when trying to create a 1,978th thread.
The OpenMP library 'vcompd.dll' throws an '0xC0000005' Access Violation exception.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you try toexecute three tests with8,192, 16,384 and32,768 OpenMP threadsin a test case I've submitted?
Could you report how much memory is allocated ( Mem Usage+ VM Size, please see the Task Manager )
for an application compiled inRelease configuration?
Thanks in advance.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vladimir,
It is "presumptuous" of MS, or any vendor for that matter, to assume that all OpenMP threads within a user application are compute only threads. And therefore by assumption requesting more threads than hardware threads causes oversubscription and as a consequence MS, or any vendor for that matter, takes it upon itself to depreciate the number of threads requested.
A programmer may have valid reasons for specifying more threads than available hardware threads. One example is when you expect one or more of your OpenMP threads will be preponderantly waiting for I/O completion (including waiting for timer). Under such situations, not permitting the programmer to "oversubscribe" results in the application compute bound threads to be "undersubscribed".
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tasking, nesting, regions
...Assuming number of threads set to number of Logical Processors + 2
int nThreads = ...; // number of Logical Processors + 2
...
#pragma omp parallel
{
// here with number of threads set to number of Logical Processors + 2
if(omp_get_thread_num() == 0)
{
doReads(); // uses queue
}
else if(omp_get_thread_num() == 1)
{
doWrites(); // uses queue
}
else
{
omp_set_num_threads(omp_get_num_threads(nThreads-2);
// next region using number of Logical Processors
doWork(); // using # Logical Processors, reading doReads queue, writing doWrites queue
}
}
As to how you would oversubscribe, this would be an implimentation issue.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim Dempsey
A similar approach is used in Windows CE.It creates some number of low priority Win32 threads and
they wait for data from high priority Win32 threads servinghardware interrupts.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In your OpenMP application you are free to create your own additional threads (e.g. _beginthread, ...)
However, you may experience some not-so-obvious issues when attempting to use OpenMP synchronization features between the OpenMP threads and the non-OpenMP threads. For example, OpenMP has a mutex lock as well as critical sections and atomic statements which may or may not work properly across thread domains (OpenMP and non-OpenMP). The documentation is written from the perspective of all threads are OpenMP.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page