- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Fortran project that reproduces the problem ( OMP errors ) attached. Output is as follows:
...
Matrix multiplication test
Enter No of ROWS / COLUMNS in A and B matricies ( integer ):
Recommended values: 1024, 2048, 4096, 8192, 16384, 32768, 65536, etc
32
Dimensions of matrices:
No of rows N = 32
No of columns N = 32
Initializing...
Done...
Calculating...
OMP: Error #136: Cannot create thread.
OMP: System error #8: Not enough storage is available to process this command.
OMP: Error #178: Function GetExitCodeThread() failed:
OMP: System error #6: The handle is invalid.
...
Notes:
- Win32 Release configuration needs to be used
- Options:
Fortran:
Optimization -> Parallelization = Yes ( /Qparallel )
Libraries -> Use Intel Math Kernel Library = Parallel ( /Qmkl:parallel )
Linker:
System ->
Heapk Commit = 268435456
Heap Reserve = 268435456
Stack Commit = 268435456
Stack Reserve = 268435456
268435456 = 256MB
- In total 1GB is reserved and ~1GB is still available for processing
Link kopiert
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Sergey,
On my system, Core i7 2600K (4 core, 8 HW threads)
256KB x 8 stack = 2GB + 256KB for heap + ?? code w/ libs .gt. 2GB in Win32
Code rus as x64
Try setting environment variable OMP_THREAD_LIMIT=4 (works here)
Jim Dempsey'
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I think you completely missed Jim's point. Each thread has to have its own stack. From inspection of the .vfproj file you have /O3 set in conjunction with /Qparallel, so Matmul will involve multiple threads. Typically the OMP subsystem creates one thread per virtual processor core. How many such cores do you have? At 256 MiB a thread it isn't going to take too many to fill the usable address space in Win32.
Note that the commit figure is a subset of the reserve figure - you don't add them.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Sergey,
Sequence of operations and comments.
Your program starts and runs up to the READ(*,*) N
At this point 1 thread is running (use Task Manager to confirm this for yourself).
At this time the CODE + HEAP(used) + STACK(used) == 5,936K (on my system)
However, the Task Manager is NOT telling you the complete situation. The Task Manager will tell you the amount of Page File Space reserved four your application. This ammount is dynamically determined as your application runs and touches (write or read) the memory.
Due to your project settings of Heap Reserve Size 268435456, and Stack Reserve Size 268435456 the virtual memory address space (on 32-bit Win32 totals 2GB) consumed is approximately: CODE(~5MB) + HEAP(256MB) + STACK(256MB) = ~ 517MB.
On your first call to MKL, you specified the parallel version of MKL, MKL will create and OpenMP thread pool of 8 threads (7 additional threads), each of which carves out 256MB of virtual memory address space of your remaining 2GB-.517GB, say 1.5GB of remaining virtual memory for your 32-bit Win32 process. By specifying 256MB for stack per thread, you can add an additional 5 or possibly 6 threads, but you cannot add an additional 7 threads without consuming more virtual memory than you have remaining to your process.
The fix for this is to reduce your Stack Reserve Size and Stack Commit SIze to a reasonable working size, say 4MB. This will require you to program in a manner such that any thread not exceed 4MB of stack. You do this by having your large allocations come off the shared heap as opposed to the thread's stack.
Jim Dempsey
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I think you have simply run out of thread stack space due to the way you've set the linker options. The only one you should even consider using is stack reserve space. If I remove all your heap and stack settings, the program runs fine on IA32 with 8 threads up to a size of 8192. After that I get "insufficient virtual memory". Trying to work around this with heap reserve/commit sizes is counterproductive - I have yet to see an application where those settings are useful. Stack commit is also inappropriate.
Please also keep in mind that for OpenMP you may need to set the environment variable OMP_STACKSIZE to set the per-thread stack size.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
>>Did you reproduce the problem?
Yes, I reproduced the problem.
Also, made the problem go away with setting maximum OpenMP threads to 4 (3 + main thread)
FortTestApp Property Pages | Configuration Properties | Debugging | Environment | OMP_THREAD_LIMIT=4
(enter the text OMP_THREAD_LIMIT=4 into the edit box)
Also, when NOT placing thread limit, .AND. removing Stack Reserve Size setting (0) and Stack Commit Size setting (0), this is to say use default (I think 4MB). The program also runs.
How many ways do we have to tell you to change your stack size specification or reduce your thread count. 32-bit Windows applications have only 2GB of virtual memory space regardless of physical memory. (you have a BOOT.INI option to extend this to 3GB). 64-bit Windows has the smaller of ~1TB or page file max.
Jim Dempsey
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Sergey, could you explain what you expect to happen, given the context you've described?
I don't see internal errors - I just see the OMP runtime complaining that the underlying operating system has (predictably, given your system and compile options) run out of a resource, followed by some secondary errors.
Why do you think this is some sort of "internal error"?
Here's a shorter "reproducer" that doesn't use mkl. I specify the number of threads because otherwise the default on my system wouldn't trigger address space exhaustion.
[fortran]!$OMP PARALLEL NUM_THREADS(8)
!$OMP END PARALLEL
END
[/fortran]
[plain]>ifort /Od /Qopenmp TooMuchStack.f90 /link /stack:268435456,268435456 && TooMuchStack.exe
Intel(R) Visual Fortran Compiler XE for applications running on IA-32, Version 13.1.0.149 Build 20130118
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation. All rights reserved.
-out:TooMuchStack.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
/stack:268435456,268435456
TooMuchStack.obj
OMP: Error #136: Cannot create thread.
OMP: System error #8: Not enough storage is available to process this command.
OMP: Error #178: Function GetExitCodeThread() failed:
OMP: System error #6: The handle is invalid.[/plain]
Here's another "reproducer" that doesn't even involve OMP, or Fortran. You can see that the OMP runtime is just passing on results and messages from the operating system.
[cpp]#include "Windows.h"
#include <stdio.h>
DWORD WINAPI ThreadProc(LPVOID lpParameter)
{
Sleep(100000);
return 0;
}
int main()
{
int i;
HANDLE thread_handle;
DWORD thread_id;
DWORD last_error;
CHAR *msg;
for (i = 0 ; i < 8; ++ i) {
thread_handle = CreateThread( NULL, 0, ThreadProc, NULL,
CREATE_SUSPENDED, &thread_id );
if (thread_handle == NULL) {
last_error = GetLastError();
FormatMessage(
FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM,
0, last_error, LANG_SYSTEM_DEFAULT, (LPTSTR) &msg, 0, NULL );
printf( "Thread creation failed.\nSystem error %ld: %s\n",
last_error, msg );
LocalFree(msg);
return 1;
}
}
return 0;
}
[/cpp]
[plain]>cl TooMuchStack.c /link /stack:268435456,268435456 && TooMuchStack.exe
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.
TooMuchStack.c
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation. All rights reserved.
/out:TooMuchStack.exe
/stack:268435456,268435456
TooMuchStack.obj
Thread creation failed.
System error 8: Not enough storage is available to process this command.[/plain]
Refere to the docs on thread stack size that I looked up yesterday before posting. Note that the actual stack size created is MAX(reserve,commit) and that CreateThread only lets you specify one of commit or reserve (some quick debugging shows that OMP_STACKSIZE influences the reserve size - which makes sense) - the other is taken from the executable defaults. Because you've specified executable defaults for commit and reserve to be 256 MB there's no way that a Win32 subsystem program can then create a thread with a stack less than that size.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Sergey,
Look at your first post in this thread. I will put an edited clip of your post here:
Linker:
System ->
Heap Commit = 268435456 \
Heap Reserve = 268435456 \- (for the process (all threads))
Stack Commit = 268435456 \
Stack Reserve = 268435456 \- (Per thread)
268435456 = 256MB
- In total 1GB is reserved and ~1GB is still available for processing
** at the point in your program where 1 thread is running **
This is before MATMUL which uses MKL and attempts to launch (on your system) 7 more threads.
1 thread ~1GB available
2 threads ~768MB available
3 threads ~512MB available
4 threads ~256MB available
5 threads ~0MB available (may crash here)
6 threads -256MB available (will crash here)
7 threads -512MB available
8 threads -768MB available
MKL, on your system, will (without limiting the thread count) attempt to create 7 more threads. Your options require each additional thread to be given 256MB of stack space. Your application runs out of memory before all of the requested additional threads have been created.
There is no reason for your program project settings to specify this large of stack. If, on the other hand, there is a insurmountable reason for having this size of stack, then on x32 you will have to reduce the number of threads your application will use. IOW you must either reduce stack size .OR. reduce thread count.
Jim Dempsey
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Sergey, I can't reproduce any hangs.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
>>Do you really think that 8 threads are needed to calculate the product of two 2x2 matricies?
No, however the way your project is configured you have requested MKL, and for MKL to use a thread pool of size = 8 (iow add 7 additional threads). By NOT specifying a thread limit, the default becomes instantiate a thread pool with number of software threads == number of hardware threads (your system has 8 hardware threads).
Depending on implimentation OpenMP may create an additional watchdog thread, as well as may create an addional thread for buffered writes (though I cannot say what stack size it may choose for these potential additional threads). It is not productive for you to shirk your responsibility of managing the resources available to you.
Have you coded your program in such a manner as to require such a large stack space?
Assume for some reason known to you that you have chosen to make local arrays be stack based (as opposed to heap based)
PROGRAM foo
REAL :: ARRAY(67108864) ! ~256MB (force to be on stack)
!$OMP PARALLEL DO
DO I=1,67108864
ARRAY(I) = I
END DO
!$OMP END PARALLEL DO
END PROGRAM foo
In the above program, only the main thread requires 256MB of stack. The remaining threads are using a reference to ARRAY (pointer of sorts to the array). The remaining threads (in the above example) could function with 2MB of stack.
When the project configuration is set to make ARRAY as SAVE or as heap array, then the main thread of the above example could get by with a smaller stack.
Jim Dempsey
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Sergey, I want you to remove all of the values for heap and stack under the Linker properties except for stack reserve. Then try again.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Don't mix KMP_STACKSIZE or OMP_STACKSIZE with Windows module definition file ( def file ) values /STACKSIZE and /HEAPSIZE and please review MSDN.If some Intel software developer decided to use a /STACKSIZE value as an input value for stack size in a Win32 API function CreateThread this is wrong and KMP_STACKSIZE or OMP_STACKSIZE have to be used instead and default values for Intel OpenMP library are as follows:
KMP_STACKSIZE - 32-bit platforms: 2MB
KMP_STACKSIZE - 64-bit platforms: 4MBThere is No any reason to use a /STACKSIZE value from a Windows module definition file ( def file ) as a stack size for a thread on Windows platform.
CreateThread only has one argument that is used to specify either the reserve size or the initial commit size. The documentation for CreateThread and thread stack size selection explains that whichever one is specified in the API call, the other is taken from the executable defaults - i.e. from the linker settings.
Put a breakpoint on the kernel32!CreateThread entry point and check for yourself - the OMP runtime does request a stack reservation that is based off OMP_STACKSIZE and friends. But see my previous post - because the actual stacksize is MAX(reserve, commit) and because you have specified such a large commit (Why? What's preventing you from relying on the operating system's automatic stack page commit system?) the environment variable becomes irrelevant. You are getting exactly what you've asked for.
(Here (13.1.0) with my three line example, the OMP runtime's process exit routine fails to complete. I think this is because during cleanup the runtime queries the state of the thread that failed to be created (or a thread that should have been created after the thread that failed). This fails, the thread generates the secondary error (the GetExitCodeThread error), following which the process exit routine is called... again. Hello recursion. Early on in the second pass through that routine the runtime attempts to aquire a lock that it already has and perhaps a deadlock results. This might be further complicated because part of that cleanup occurs during DLL_PROCESS_DETACH. But note your Fortran program is well and truely hosed by this point - the fundamental problem is that you are asking the system to do the impossible. Controlled process termination from this sort of edge case scenario is always going to be a challenge.)
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden

- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite