Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Couldn't create more than 64 OpenMP threads in a test application

SergeyKostrov
Valued Contributor II
4,781 Views
Hi everybody,

I recently done a test in a simple OpenMP based application andOpenMPcouldn't create more than 64 threads.

Here is a code of the test:

#include <omp.h>

void main( void )
{
int iShowNumOfThreads = 1;

omp_set_num_threads( 1024 );

#pragma omp parallel num_threads( 1024 )
{
if( iShowNumOfThreads == 1 )
{
iShowNumOfThreads = 0;
printf( "Number of threads created: %ld\\n", ( int )omp_get_num_threads() );
}

for( int i = 0; i < 16777216; i++ )
{
double dA = ( 2 * 4 * 8 * 16 );
}
}

printf( "Done\\n" );
}

How could I create as many as possible OpenMP threads? For example, more than 32,768?

Best regards,
Sergey

0 Kudos
35 Replies
jimdempseyatthecove
Honored Contributor III
1,412 Views
OpenMP is a syntax you can layer onto your C/C++/FORTRAN programs.

For C/C++ the syntax is in the form of #pragma omp... that you insert into your program and which can be enabled or disabled via compiler switches. In FORTRAN the syntax in introduced as compiler directives specified as comments, and which can be enabled or disabled using compiler switches.

for(int i = 0; i < N; ++i)
{
...
}

Becomes:

#pragma omp parallel for
for(int i = 0; i < N; ++i)
{
...
}

With the for loop being the same as without #pragma.
In FORTRAN

!$OMP PARALLEL DO
DO I=1,N
...
END DO


Should N be large enough, the iteration space will be partitioned by the number of threads available on your system (24 in your case). Each partition will run in parallel.

Note, some loops may require special considerations to avoid multiple threads from updating the same location at the same time.

Please look at the sample code in the documentation. Using OpenMP is relatively easy... but there are a few programming considerations you need to follow if you want correctness and performance.

Start with simple improvements to your code and then get more aggressive as you gain experiance.

Jim Dempsey
0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views
...
In your OpenMP application you are free to create your own additional threads (e.g. _beginthread, ...)
However, you may experience some not-so-obvious issues when attempting to use OpenMP synchronization features between the OpenMP threads and the non-OpenMP threads. For example, OpenMP has a mutex lock as well as critical sections and atomic statements which may or may not work properly across thread domains (OpenMP and non-OpenMP)...

An application of the'_beginthread' function is another option to consider. Thank you.

Best regards,
Sergey
0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views
...
Please look at the sample code in the documentation. Using OpenMP is relatively easy... but there are a few programming considerations you need to follow if you want correctness and performance.

Start with simple improvements to your code and then get more aggressive as you gain experiance.

Jim Dempsey


Hi Jim,
Thank you for the feedback and I really appreciate it. There is only one problem at the moment, that
is, alack of time. I can't work 24 hours a day... :)
Best regards,
Sergey

0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views
...
However, the implementation is free to have internal limits (e.g. 64 threads max). I do not know if the MS implementation of OpenMP actually enforces a limit internally...

Hi Michael,

I tried to findexplanation(s) on MSDN website and I found a very interesting statement at:

http://msdn.microsoft.com/en-us/library/d8wkzt26(v=vs.80).aspx

...
The omp_set_num_threads function sets the default number of threads to use for subsequent parallel
regions that do not specify a num_threads clause.
...

When I removed the 'num_threads' clause nothing has changed andmy test couldn't create more
than64 threads. So, I'll try to contact Microsoft and let's see what they say.

Best regards,
Sergey
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,412 Views
Sergey,

>> I'll try to contact Microsoft and let's see what they say.

If you are compiling with the Intel toolchain the OpenMP library will be that provided by Intel. IOW any thread limitation will be imposed by the Intel code.

If you are compiling with the MS toolchain the OpenMP library will be that provided by MS.

Who you contact will depend on who's toolchain you use.

Note, in an earlier post you showed:

[cpp] ... 72881482 call _vcomp::min (...) // Here some verification is done ... 7288148C push edx // A number of Win32 threads to create ( 64 ) is in EDX register 7288148D mov ecx,dword ptr [ebp-4] 72881490 call _vcomp::PerThreadData::SetNextNumThreads (...) // Initializes some internal structures but Win32 threads are still not created 72881495 mov esp,ebp 72881497 pop ebp ... 004B72D6 call @ILT+11795( __vcomp_fork ) (...) // Creates Win32 threads and starts processing ... [/cpp]
You might consider hooking that library function or replacing it (assuming you do not get a satisfactory work around from MS).

Coersion sometimes works.

In the above dump, look at where the args to _vcomp::min came from.If it is from an environment variable (or result of lack thereof) then use that environment variable. If not, then create a static object, loaded early in your image, whos ctor makes an appropriate adjustment to the arg that is restricting your desired thread count.

These additions will not be portable, so inclose them in an appropriate conditional compile section, perhaps including a #pragma message("Hack to bypass MS restriction on upper thread count")

Jim Dempsey
0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views
Sergey,

>> I'll try to contact Microsoft and let's see what they say.

If you are compiling with the Intel toolchain the OpenMP library will be that provided by Intel. IOW any thread limitation will be imposed by the Intel code.

If you are compiling with the MS toolchain the OpenMP library will be that provided by MS.
...


I've submitted a feedback / questionon MSDN and I hope that somebody from Microsoft will explain that
limitation with 'vcomp.dll' / 'vcompd.dll' DLLs.

Best regards,
Sergey

0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views
Thanks, Jim.

Quoting jimdempseyatthecove
...assuming you do not get a satisfactory work around from MS)...


Here is an update from Microsoft:
...
We are rerouting this issue to the appropriate group within the Visual Studio Product Team for triage and
resolution. These specialized experts will follow-up with your issue.
...

0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views
Here is a response:

...
The internal limit on the number of threads is indeed 64, and was directed by the limit of the number of
virtual processes available on a Windows PC back a few years. The situation has improved with the 64-bit
versions of Windows 7 (see http://windows.microsoft.com/en-US/windows7/products/system-requirements).
We will fix our internal OpenMP limits in a future release. Thanks for reporting this issue.
...

Unfortunately, it is not clear in what release it will be fixed. There aremanyversions of Visual Studios at the moment.
0 Kudos
TimP
Honored Contributor III
1,412 Views
Thanks for pushing this, even though it's only of academic interest to some of us. I was somewhat surprised that you got a response at all. As Intel has been producing Westmere-EX platforms with 80 logical processors for some time, and promotes OEMs designing platforms with more, there has been some dismay at the 64 thread per partition limit in Windows.
0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views
Quoting TimP (Intel)
Thanks for pushing this, even though it's only of academic interest to some of us.

[SergeyK] There is a significant practical interest for mebecause OpenMP is considered for some
project with high number of threads and strict portability requirements. Microsoft's
AMPtechnology is not considered. Microsofttries to "kill" any tecnology with a key
word 'Open'. OpenGL is another example.

I was somewhat surprised that you got a response at all.

[SergeyK] I'm a little bit disapponted with the Microsoft'sresponse because it is absolutely not clear
in what release it will be fixed. That is, in some version(s) of Visual Studio or a
Windows OS.
0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views

That limitation is related to a maximum number of wait objects on Windows platforms.

There is a definition in 'winnt.h' header file:

...
#define MAXIMUM_WAIT_OBJECTS 64 // Maximum number of wait objects
...

0 Kudos
paul_oxy
Beginner
1,412 Views

Hi Sergey,

IntelOpenMP libraryallows to create up to 32,768 threads ina parallel region.

Did you follow a link I provided? Please take a look. As soon as I applied a "hack" in the VS Debugger theMicrosoftOpenMP library ( vcompd.dll )was able to create more than 1,024 threads.

Also, where did you see a limitation for '...OS calls to WaitAll and WaitOne...'? What Win32 API functions are you talking about? Could you give me exact names, please?

Best regards,

Paul Gregorie

accessoireinformatique

vidosurveillance-alarme


0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views
Hi Paul,

Quoting paul_oxy
...
Did you follow a link I provided? Please take a look. As soon as I applied a "hack" in the VS Debugger theMicrosoftOpenMP library ( vcompd.dll )was able to create more than 1,024 threads.

Also, where did you see a limitation for '...OS calls to WaitAll and WaitOne...'? What Win32 API functions are you talking about? Could you give me exact names, please?
...


Actually these are mycomments onParallel Computing General forum with another software developer.

Best regards,
Sergey

0 Kudos
Abhishek81
Novice
1,412 Views
Thanks Sergev for the Information, I am studying the posts.
0 Kudos
SergeyKostrov
Valued Contributor II
1,412 Views
As you can see some number of spaces are deleted in posts and sometimes it looks like: >>...mycomments onParallel Computing... It happened during upgrade from the old ISN web-site to the new IDZ web-site but texts are still readable.
0 Kudos
Reply