Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

Couldn't create more than 64 OpenMP threads in a test application

SergeyKostrov
Valued Contributor II
3,224 Views
Hi everybody,

I recently done a test in a simple OpenMP based application andOpenMPcouldn't create more than 64 threads.

Here is a code of the test:

#include <omp.h>

void main( void )
{
int iShowNumOfThreads = 1;

omp_set_num_threads( 1024 );

#pragma omp parallel num_threads( 1024 )
{
if( iShowNumOfThreads == 1 )
{
iShowNumOfThreads = 0;
printf( "Number of threads created: %ld\\n", ( int )omp_get_num_threads() );
}

for( int i = 0; i < 16777216; i++ )
{
double dA = ( 2 * 4 * 8 * 16 );
}
}

printf( "Done\\n" );
}

How could I create as many as possible OpenMP threads? For example, more than 32,768?

Best regards,
Sergey

0 Kudos
35 Replies
SergeyKostrov
Valued Contributor II
2,329 Views
Here is a screenshot for review:

0 Kudos
SergeyKostrov
Valued Contributor II
2,329 Views

I'd like to provide some additional information and a new question ( please see 3. ):

1. I alsoset an environment variable 'OMP_NUM_THREADS' to 1024 and it doesn't change the limitation.
A call to 'omp_get_max_threads' OpenMP function, like:
...
printf( "Max Number of threads: %ld\n", omp_get_max_threads() );
...
returns 64.

2. Only 58 threads are reported as exited. So, by some reason 6 threads are lost! Here is
a Visual Studio 2005 output:
...
The thread 'Win32 Thread' (0x204) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x1b0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc14) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x910) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x814) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x874) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x9b0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x980) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc4) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xbc0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xeec) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x8f4) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x30c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc30) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xca0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb44) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x778) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x9f8) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa24) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x748) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xf70) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc88) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x678) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb94) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc5c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa70) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc84) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa98) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc74) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x4e0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xaa8) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa44) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xbe8) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa8c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x63c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xca4) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xcc0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x518) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc7c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x98c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb34) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xa60) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x96c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xf90) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xbfc) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xbe0) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xe14) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x5cc) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xcf8) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xf10) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x34c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb78) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xdc) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb1c) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xb98) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0xc94) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x6fc) has exited with code 0 (0x0).
The thread 'Win32 Thread' (0x6d8) has exited with code 0 (0x0).
...

3. I wonder if OpenMP version 2.0 ( March 2002 )has some limitations and doesn't allow to create more than 64 threads?

0 Kudos
Vladimir_P_1234567890
2,329 Views

Hi Sergey,

corrected example

[cpp]#include #include int main( void ) { omp_set_num_threads( 1024 ); #pragma omp parallel num_threads( 1024 ) { if( omp_get_thread_num() == 0 ) { printf( "Number of threads created: %ldn", ( int )omp_get_num_threads() ); } for( int i = 0; i < 16777216; i++ ) { double dA = ( 2 * 4 * 8 * 16 ); } } printf( "Donen" ); return 0; } [/cpp]


And its output for Composer XE 2011 update 9

[bash]omp_test>omp_test.exe Number of threads created: 1024 Done[/bash]
Which compiler did you use?

update:

I wonder ifOpenMPversion2.0( March 2002 )has some limitations and doesn't allow to create more than64threads?

Specification does not set any limitation. It is up to implementation.

--Vladimir
0 Kudos
SergeyKostrov
Valued Contributor II
2,329 Views
Thank you forthe feedback.

>>Which compiler did you use?

The test was done with Visual Studio 2005.

>>...It is up to implementation.

Did Microsoft's implementation set some limits?

Best regards,
Sergey
0 Kudos
Michael_K_Intel2
Employee
2,329 Views
Dear Sergey,

The OpenMP specification does not set any limits on the number of threads, except for what the interface to the runtime routines accept as input values. So, you should be save about this.

However, the implementation is free to have internal limits (e.g. 64 threads max). I do not know if the MS implementation of OpenMP actually enforces a limit internally. You did not write about the machine you're working on. Is the machine a WSM-EX box with more than 64 cores (including the Hyper-Threading cores)? If yes, the limit might come from the fact the Windows processor groups are limited to 64 cores and that you need a new API to distribute threads across different processor groups. Alas, this has to be done from the OpenMP runtime and it might be the case that the MS implementations limits the thread number to the size of the processor group.

If you use the Intel OpenMP runtime you should not see any restrictions on the number of threads that you can create.

Cheers,
-michael
0 Kudos
Vladimir_P_1234567890
2,329 Views
Intel OpenMP RTL also does have a limitation -32768 threads. But for me it is hard to imagine who needs these all threads on one machine.
--Vladimir
0 Kudos
Vladimir_P_1234567890
2,329 Views
>>...It is up to implementation.

Did Microsoft's implementation set some limits?

It is better to ask Visual Studio team.

But using VS2010 I've got the same 64 threads.

--Vladimir

0 Kudos
SergeyKostrov
Valued Contributor II
2,329 Views
Thank you, guys! I also confirm that this is a Microsoft's limitation. But, I managed to create 1,024 OpenMP threads.
Unfortunately, this is a "hack" and I'll provide technical details later.

Best regards,
Sergey
0 Kudos
SergeyKostrov
Valued Contributor II
2,329 Views

A C/C++ code with an OpenMP directive:
...
#pragma omp parallel num_threads( 1024 ) // 64 us a default value andwill be used instead
{
...

is compiled to several initialization calls in assembler language:

...
72881482 call _vcomp::min (...) // Here some verification is done
...
(1)7288148C push edx // A number of Win32 threads to create ( 64 )isin EDX register
7288148D mov ecx,dword ptr [ebp-4]
72881490 call _vcomp::PerThreadData::SetNextNumThreads (...) // Initializes some internal structures but Win32 threads are still not created
72881495 mov esp,ebp
72881497 pop ebp
...
004B72D6 call @ILT+11795( __vcomp_fork ) (...) // Creates Win32 threads and starts processing
...

At (1) a register EDX is already set with a maximum number of threads and this is 64. In the debugger
I changed the value of the EDX register to 1,024 ( 0x400 ). Then, a call to internal OpenMP function '__vcomp_fork'
creates 1,024 Win32 threads and starts the processing.

Here is a screenshot of the Windows Task Manager:



Note: 1,025 = 1 ( Win32 parent process ) + 1,024 ( Win32 threads created by OpenMP API)

Here is some information on an OpenMP DLL loaded by the test application:
...
Loaded 'C:\WINDOWS\WinSxS\x86_Microsoft.VC80.DebugOpenMP_1fc8b3b9a1e18e3b_8.0.50727.4053_x-ww_3f6e27c4\vcompd.dll', Symbols loaded (...).
...

0 Kudos
Vladimir_P_1234567890
2,329 Views
Hi Sergey,
Overriding of setting a number threads is not big deal. The big deal is to work with these 1024 threads in openmp constructs:)
You need to know internal implementation to find out if this number of thread is supported or not internally. And of course it is unsupported officially.
For example you can take either pi or fibonacci examples to see whether it still works for 1023 threads in this case. And looking into task manager do I understand correct that this 1024 thread application is executed in 1 thread (CPU field)?
Other words if your application will crash in openmp runtime you can't come and say "i've hacked your library but it does not work"))))
--Vladimir
0 Kudos
SergeyKostrov
Valued Contributor II
2,329 Views
...And looking into task manager do I understand correct that this 1024 thread application is executed in 1 thread (CPU field)?...

Yes, that test was done on a computer with one CPU and the purpose of the test is simple - astress testing
of OpenMP library and evaluation of memory requirements forOpenMPapplication with a number of
threads greater than 1,024.

My current result is as follows: Microsoft's implemented OpenMP library v2.0doesn't allow to create more
than 1,977 threads. The application crashes when trying to create a 1,978th thread.



The OpenMP library 'vcompd.dll' throws an '0xC0000005' Access Violation exception.

Best regards,
Sergey
0 Kudos
SergeyKostrov
Valued Contributor II
2,329 Views
Hi Vladimir,

Quoting Vladimir Polin (Intel)
Intel OpenMP RTL also does have a limitation -32768 threads...

Could you try toexecute three tests with8,192, 16,384 and32,768 OpenMP threadsin a test case I've submitted?

Could you report how much memory is allocated ( Mem Usage+ VM Size, please see the Task Manager )
for an application compiled inRelease configuration?

Thanks in advance.

Best regards,
Sergey
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,329 Views

Vladimir,

It is "presumptuous" of MS, or any vendor for that matter, to assume that all OpenMP threads within a user application are compute only threads. And therefore by assumption requesting more threads than hardware threads causes oversubscription and as a consequence MS, or any vendor for that matter, takes it upon itself to depreciate the number of threads requested.

A programmer may have valid reasons for specifying more threads than available hardware threads. One example is when you expect one or more of your OpenMP threads will be preponderantly waiting for I/O completion (including waiting for timer). Under such situations, not permitting the programmer to "oversubscribe" results in the application compute bound threads to be "undersubscribed".

Jim Dempsey

0 Kudos
Vladimir_P_1234567890
2,329 Views
hi Jim,
as I wrote before openmp specification does set lower and higher limits for threads count and every implementation will use as many maximum threads as they want.
and this is up to customers to take the runtime library that fits best for their needs. But I believe if implementation offers maximum 2 threads nobody will use it.
BTW, how can I/O jobs be implemented in OpenMP case, via tasking?
--Vladimir
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,329 Views
>>BTW, how can I/O jobs be implemented in OpenMP case, via tasking?

Tasking, nesting, regions

...Assuming number of threads set to number of Logical Processors + 2

int nThreads = ...; // number of Logical Processors + 2
...
#pragma omp parallel
{
// here with number of threads set to number of Logical Processors + 2
if(omp_get_thread_num() == 0)
{
doReads(); // uses queue
}
else if(omp_get_thread_num() == 1)
{
doWrites(); // uses queue
}
else
{
omp_set_num_threads(omp_get_num_threads(nThreads-2);
// next region using number of Logical Processors
doWork(); // using # Logical Processors, reading doReads queue, writing doWrites queue
}
}

As to how you would oversubscribe, this would be an implimentation issue.

Jim Dempsey
0 Kudos
SergeyKostrov
Valued Contributor II
2,329 Views
...threads will be preponderantly waiting for I/O completion (including waiting for timer)...

Jim Dempsey


A similar approach is used in Windows CE.It creates some number of low priority Win32 threads and
they wait for data from high priority Win32 threads servinghardware interrupts.

Best regards,
Sergey

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,329 Views
Sergey,

In your OpenMP application you are free to create your own additional threads (e.g. _beginthread, ...)
However, you may experience some not-so-obvious issues when attempting to use OpenMP synchronization features between the OpenMP threads and the non-OpenMP threads. For example, OpenMP has a mutex lock as well as critical sections and atomic statements which may or may not work properly across thread domains (OpenMP and non-OpenMP). The documentation is written from the perspective of all threads are OpenMP.

Jim Dempsey
0 Kudos
021184
Beginner
2,329 Views
I have1 computerof 24cores andmyfortranprogramonly uses1.Someonecouldwritethat I canmake myfortranprogramusing the 24cores andreducethecalculation time?
I have theintelfortrancompiler2011 inRed Hat Linux.
thanks!
0 Kudos
Vladimir_P_1234567890
2,329 Views
Sure you can

I might suggest to start from ISN page

Or search for"fortran openmp example" string in internet.
--Vladimir
0 Kudos
021184
Beginner
2,068 Views
openmpis?openmpisa program todo that?
ThanksBladimirwill reviewit.
--Mel
0 Kudos
Reply