- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am attempting to parallelize a serial code, where there are several functions and subroutines. All of them were written to work only with arguments (PURE). Intention is to share a big real*4 array sized any where between 200MB to 4GB across 4~20 threads for parallel execution. Current serial execution is compiled with heap, so no issues.
Coming to the issue, when I attempt to execute the /Qopenmp compiled executable, I get the forrtl: severe (170): Program Exception - stack overflow error. In debug mode when entering first subroutine within parallel region, I see error message while breaking:
Unhandled exception at 0x00007FF60CEC0C18 in test.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x000000B23C113000).
Exception thrown at 0x00007FF60CEC0C18 in test.exe: 0xC0000005: Access violation writing location 0x000000B23C110000.
If there is a handler for this exception, the program may be safely continued.
Compiler options used are - /nologo /debug:full /Od /warn:all /traceback /check:bounds /check:stack /libs:dll /threads /dbglibs /Qopenmp
Am I missing anything here?
Out of curiosity, I did try with OMP_SET_NUM_THREADS(1), still it throws the same error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>share a big real*4 array sized any where between 200MB to 4GB across 4~20 threads for parallel execution
Shared arrays are passed by reference, if the array exists prior to the parallel region, then there should be no issue with sharing this array.
Note, if this array is used in an array expression that requires a temporary to be created, then you may run into stack capacity issues. When this is the case, it can be mitigated by:
adding the option /heap-arrays
change very large automatic arrays into allocatable arrays
changing the offending array expression(s) into explicit DO loop(s)
OMP_STACKSIZE=nnnn[B|K|M|G|T](K) OpenMP created additional threads (not main thread)
KMP_STACKSIZE=nnnn[B|K|M|G|T](K) OpenMP created additional threads (not main thread)
Don't go overboard with stack size.
While /heap-arrays may solve your problem, it may(will) also introduce additional overhead. You will tend to get best performance with a combination of the other approaches. (at the expense of a little more programming)
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>share a big real*4 array sized any where between 200MB to 4GB across 4~20 threads for parallel execution
Shared arrays are passed by reference, if the array exists prior to the parallel region, then there should be no issue with sharing this array.
Note, if this array is used in an array expression that requires a temporary to be created, then you may run into stack capacity issues. When this is the case, it can be mitigated by:
adding the option /heap-arrays
change very large automatic arrays into allocatable arrays
changing the offending array expression(s) into explicit DO loop(s)
OMP_STACKSIZE=nnnn[B|K|M|G|T](K) OpenMP created additional threads (not main thread)
KMP_STACKSIZE=nnnn[B|K|M|G|T](K) OpenMP created additional threads (not main thread)
Don't go overboard with stack size.
While /heap-arrays may solve your problem, it may(will) also introduce additional overhead. You will tend to get best performance with a combination of the other approaches. (at the expense of a little more programming)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jim,
I am running a Fortran program using Visual Studio 2019 in a Windows 10 PC. I can successfully build both debug and release executables. However, when I run the dataset, both executables complain about either stack overflow or access violation. When I run in debug mode, I get notifications of unhandled exceptions:
Unhandled exception at 0x00007FF757662F57 in myprog.exe: 0xC0000005: Access violation writing location 0x000000A8BEDFF000.
I read in some forums that one solution would be to increase the size of the stack and heap arrays, and I have done that by adding 1 Mb (1048576). I have tried other solutions found on the internet, but I am unable to find a stable solution.
Another suggested solution was to install oneAPI Base Toolkit, which I've done.
I should mention that I know little about these issues, so any help will be appreciated.
Bern
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I allocated the big array and declared shared while entering parallel region. Even private variables are allocated before entering the parallel region.
Will try with /heap and allocating stacks.
I'm not sure about offending array expressions. Will you be able to explain it a little?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Fortran permits array expressions
ArrayOut = sqrt(ArrayIn1**2 + ArrayIn2**2 + ArrayIn3**2)
Which will create 3 or 4 array temporaries. And without -heap-arrays, the compiler will place these on stack. When the temporaries are quite large, you will experience stack overflow. This statement can be replaced with a small loop that eliminates the array temporaries.
DO I=1,UBOUND(ArrayOut) ArrayOut(I) = sqrt(ArrayIn1(I)**2 + ArrayIn2(I)**2 + ArrayIn2(I)**2) END DO
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, it worked for me. All I missed was setting OMP_STACKSIZE for runtime. Many thanks to you. And I'm not using any offending array expressions, in fact I ensured that no temporary array was generated by switching on the runtime warnings.
And it made wonders for arrays of smaller sizes in range of 200MB even with 12 threads. To minimize the overhead I left the scheduling to default. Increase in speed was almost equal to number of threads involved. But for larger array of 4GB, again I got stack overflow error, even with 2 threads on. Is there any way of dynamically setting the number of threads based on available stack size per thread in windows x64?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would not reduce the number of threads based on array sizes. Generally, larger data benefits from more threads.
If you can isolate the excessive stack consumption to a specific subroutine, and you desire stack allocation when using small/medium array sized, and resort to heap for large/huge array sizes, then I suggest you adapt your code somewhat like this:
subroutine foo(N, A, B) implicit none real :: A(N), B(N) ! dummies/no allocation if(N > useHeap) then call foo_heap(N, A, B) else call foo_stack(N, A, B) endif contains subroutine foo_heap(N, A, B) implicit none real :: A(N), B(N) ! dummies/no allocation real, allocatable :: work(:) allocate work(N) call foo_either(N, A, B, WORK) end subroutine foo_heap subroutine foo_stack(N, A, B) implicit none real :: A(N), B(N) ! dummies/no allocation real :: work(N) ! stack call foo_either(N, A, B, WORK) end subroutine foo_heap subroutine foo_either(N, A, B, work) implicit none real :: A(N), B(N), work(N) ! dummies/no allocation ... ! code here end subroutine foo_either end subroutine foo
FWIW The original intent of /heap-arrays:nK was to do the equivalent to the above, however, it has been reported here that the nK feature doesn't work well.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alternative:
subroutine foo(N, A, B) implicit none real :: A(N), B(N) ! dummies/no allocation real, SAVE, allocatable :: work(:) !$omp threadprivate(work) if(allocated(work)) then if(N > size(work)) then deallocate(work) allocate(work(N)) endif else allocate(work(N)) endif ... end subroutine foo
The choice would depend upon if you need to reclaim the heap upon the return from foo
**** CAUTION
use: work(1:N) instead of work alone.
Should you call foo with N less than size(work), then using WORK alone expresses more data, as well as when used as WORK = then work will get reallocated and defeats your optimization efforts.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Without having your program for analysis it is difficult to provide anything other than a best guess...
>>Unhandled exception at 0x00007FF757662F57
The program virtual address (code) above is located at ~140.7TB (terabyte). This is either:
a) an invalid user program address, or
b) a Windows O/S system address
>>Access violation writing location 0x000000A8BEDFF000
The data location in decimal is: 724,756,852,736 or 724.7 GB (gigabyte)
This may indicate either:
a) an errant address was used, or
b) a valid address was used (humoungous allocation) .AND. the system page file size was exceeded.
Situation b) can be perplexing given that an allocation will succeed (virtual address taken) but the error only occurs later when the actual page file page is allocated upon first touch of the data .AND. (the page file has been exhausted .OR. a program limit has been reached).
Does the size of the data seem appropriate?
Jim Dempsey
Is 724.7GB the expected
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Jim!
The Fortran code I am running is an terrestrial ecosystem model, which is used by many colleagues. To me, it is a question of system address. The data allocation of 724.7 Gb is not the expected. I am including a screenshot of where the problem occurs. It seems to me that the solution is either a question of Visual Studio 2019 or Windows O/S setting.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error is occurring at entry to TGRAZ where it is allocating the local arrays. I suspect this is exceeding available stack space.
I suggest that you add IMPLICIT NONE, then define the (local) arrays with proper type, *** but make these allocatable.
Keep old source lines (DIMENSION...) as comments, then insert the ALLOCATE statements as necessary.
This will accomplish a few things:
1) Require you to declare types used by the procedure (avoids typing errors, either by you or by earlier developer).
2) Delays any runtime error (due to oversized allocations) such that you can a) note where the error occurs, b) permit you to insert diagnostic code (for example, for without IMPLICIT NONE, a mistyped array dimension will be implicitly declared, but undefined).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Jim. I will give it a try and let you know the results.
Bern
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page