Stack overflow in a recursive subroutine - Page 2

krishnabc · ‎08-09-2011

Hello,

I encountered a stack overflow in a recursive sorting subroutine for three 1-D arrays; two interger(4) and one real(8) arrays. The size of the arrays was 122,230,324. I expect even larger arrays for some other problems. I had used /heap-arrays:0.

Any suggestions would be greatly appreciated.

Thank you.
Krishna

jimdempseyatthecove · ‎03-06-2012

One more thing.

How much RAM does your system have?

Windows by default sets the page file size == physical RAM size.
i.e. if you are on a 4GB system, you may need to change the properties on the page file to permit it to grow.

If you noted in my response with successfully running your program, the page file did NOT expand at the point in the program of the allocation. The expansion of the page file occured during the "first touch" writes of the initialization data (RANDOM...). Testing for allocation failure at allocation point only catches exhaustion of address space (not the same as committed page file space). Thereforewhen a"first touch" to a page in virtual memory that is not mapped to the page file, and when the subsequent allocation of page from page file exceeds capacity of page file, you will receive an error writing (or reading if reading before initialization).

Jim Dempsey

SergeyKostrov · ‎03-06-2012

Hi Krishna, Consider an abstract case:

- There arethree arrays of~128MB size:

Two'Integers' arrays
One'Double-Precision' array

- Let's assume thatvalues in'Integer' arrays satisfy tothe following condition:

Values randomly generated,positive and in some range from 0 to N, where N < 65,536

1.In order to achieve as better as possible performance when sorting 'Integer' arraysPegionhole
algorithm could be used. Performance gains are significant (!) becausePegionhole sorts an array of
integers in two passes. For differentarrays with sizes from 64MB to 256MB performance improvements
on a 32-bit Windowsplatform were from 50x to 120x.

Advantages of Pegionhole:

- it is iterative
- significantly faster, compared to pureQuick or Heap sorting algorithms

Disadvantages of Pegionhole:

- needs a temporary buffer of some size ( in above assumedcase 65,536 integers)
- values must be in some limitedrange
- if values arenot positive they could be "normalized" first, and "denormalized" as soon
as sorting done

2. Since sizes ofall arrays are greater than 64MB thenHeapSort algorithm willsort'Double-Precision'
arrayfaster then pureQuickSort.

I've just completed a series oftests and all my statementsbased on real results. A "breaking" point
betweenHeapSortand pureQuickSort algorithms is 64MB for an array size.For a long time I've
considered QuickSort algorithm asfastest, compared to HeapSort, and I was surprised to seea different result.

Best regards,
Sergey

SergeyKostrov · ‎03-07-2012

This is a follow up.

Relative performance of Quick and Heap sorting algorithms for different sizes of array and it is based on
results of tests I just completed.

Array
size QuickSort HeapSort

16MB 2.4x faster 1.0x
32MB 1.5x faster 1.0x
64MB 1.0x 1.2x faster
128MB 1.0x 2.2x faster
160MB 1.0x 2.8x faster
192MB 1.0x 3.2x faster

I hope that information will be useful for software developers workingon projects with large data sets.

krishnabc · ‎03-07-2012

Dear Jim Dempsey,

Yes, you are right. The test program (sorting) ran to completion for the given size of arrays (122230324) in 64-bit platform. However, the same code fails in the actual program for about the same size of arrays. My computer configuration is similar to yours one.

Windows 7 x64 (16GB RAM)

Intel Core i7-2600 CPU @ 3.40 GHz

Microsoft Visual Studio 2010 Ultimate

Version 10.0.40219.1 SP1Rel

Microsoft .NET Framework

Version 4.0.30319 SP1Rel

Intel Visual Fortran Package ID: w_fcompxe_2011.9.300

I tried to run the program (my actual one with above codes) for several times, it failed at the sorting routine for almost all the times. It passes over the sorting routine a few times (2-3 times) successfully. However, I didn't get theexpectedresults (expected results means, either convergence or divergence of my solver). In all other cases, when my problem size is smaller than this, I have no problem with my solver and the results are correct (verified with analytical solutions and we have been using it for quite some years). Hence, I really need to investigate the cause carefully.

Thank you.

krishnabc · ‎03-07-2012

I monitored the the memory usage in Task manager for the attached code (post #3) and for array size (122230324), the commit, working set, and private all were in line with your results.

For the actual problem, where it usually fails for about the same size of arrays, the memory usage are:

before stepping over the call to SORT:

Commit (KB) = 3,154,996

Working set (KB) = 3,106,844

Private (KB) = 3,105,284

after stepping over the call to SORT:

Commit (KB) = 3,155,552

Working set (KB) = 3,107,400

Private (KB) = 1,924,908

The larger memory usage in the actual code could be due to other portion of the codes. I think page file space shouldn't be a problem because it is same as RAM (16GB), total virtual memory is 32 GB, available virtual memory is 27.3 GB.

Thank you.

Krishna

krishnabc · ‎03-07-2012

No, the errors posted in #10 are actual errors in the output window (no truncation at all). The only part that I have modified is the name of the program when I pasted it here.

The mocked-up test program didn't fail, but I am encoutering the said errors on the same code in my actual program. The difference in mocked up program and the actual program is the input values of arrays. Here, they are randomly generated. In actual code, they are generated from stiffness properties in finite elements.

In the same post (post #) another error message occured using all the dummy arrays declared with assumed shape (:) throughout the program. With such declarations, some redundant arguments are taken out. For example,

(1) Routine SORT: The first argument Nzua is taken out. It is obtained using UBOUND(Irr,1)inside routine.

(2) Routine SETSORT: The first argument Nzua is taken out.

(3) Routine INSORT: The last argument Nzua is taken out.

Thank you.

krishnabc · ‎03-07-2012

Dear Sergey Kostrov,

Thanks for your test results for comparing various sorting algorithms. I'll take a look whether swapping to Heap sort can resolve my problem. At this moment, my major concern is not the speed but to able the solve the large problems correctly. Speed will come later.

Thanks.

Krishna

SergeyKostrov · ‎03-07-2012

Quoting krishnabc

...At this moment, my major concern is not the speed but to able the solve the large problems correctly. Speed will come later...

This is what I would do as well. Good luck in investigation!

Let me know if you need a C++template based codes for the Heap sorting algorithm. I could release the
codes because originallythey were from a public domain.I've done somemodification before integrating
into a project I'm currently working on.

Best regards,
Sergey

krishnabc · ‎03-08-2012

Now, with all the dummy arrays declared in assumed shape format, the following is the exception message in output window when it breaks at recursive routine:

First-chance exception at 0x000000013f2885b8 in Program.exe: 0xC00000FD: Stack overflow.

'Program.exe': Loaded 'C:\Program Files (x86)\Intel\Composer XE 2011 SP1\redist\intel64\compiler\1033\ifcore_msg.dll', Binary was not built with debug information.

First-chance exception at 0x772e3560 in Program.exe: 0xC0000005: Access violation writing location 0x0000000000040ff4.

Unhandled exception at 0x772e3560 in Program.exe: 0xC0000005: Access violation writing location 0x0000000000040ff4.

jimdempseyatthecove · ‎03-08-2012

>>The difference in mocked up program and the actual program is the input values of arrays

In post #6 mecej writes:

"quicksort involves O ( lg N) recursion levels and O(N lg N) comparisons on average. Without safeguards, a naive implementation can see a degradation of performance to O(N) recursion levels and O(N2) comparisons"

It is possible you are seeing something nearO(N) recursion levels.

It should be easy enough to replace the SORT subroutine and see what happens. Get the code to work correctlyfirst, then work faster second.

Jim Dempsey

jimdempseyatthecove · ‎03-08-2012

>>I think page file space shouldn't be a problem because it is same as RAM (16GB), total virtual memory is 32 GB, available virtual memory is 27.3 GB

Verify.

Start the System Monitor, add Page file to charts. Run program to failure. See where chart maxed out.

Jim Dempsey

jimdempseyatthecove · ‎03-08-2012

>>I tried to run the program (my actual one with above codes) for several times, it failed at the sorting routine for almost all the times. It passes over the sorting routine a few times (2-3 times) successfully. However, I didn't get theexpectedresults (expected results means, either convergence or divergence of my solver). In all other cases, when my problem size is smaller than this, I have no problem with my solver and the results are correct.

Convergence/divergence problems are often related to how you determine convergence. Some techniques are more accurate than others. And other techniques are not reliable. The accumulation of roundoff errors is usually the culprit but there are others such as divide by 0/near-0.

Are NaN's produced?

Jim Dempsey

SergeyKostrov · ‎03-08-2012

Quoting krishnabc

...First-chance exception at 0x000000013f2885b8 in Program.exe: 0xC00000FD: Stack overflow...

Could you increasethe StackReserve and Commit valuesin the Project settings?

krishnabc · ‎03-08-2012

>>Convergence/divergence problems are often related to how you determine convergence. Some techniques are more accurate than others. And other techniques are not reliable. The accumulation of roundoff errors is usually the culprit but there are others such as divide by 0/near-0.

Agree. I have several algorithms(6/7 methods)to solve the linear system. The same problem can be solved with the method which doesn't use sorting routine, but it fails with all the methods that use sorting routine.

>>Are NaN's produced?

No. Whatever residual norm is produced in the first iteration (which seems in the reasonable range), remains unchanged/unaffected with increasing iteration.

SergeyKostrov · ‎03-09-2012

Quoting krishnabc

...The same problem can be solved with the method which doesn't use sorting routine, but it fails with all
the methods that use sorting routine...

Krishna,

1. Could you try to comment a call to the sorting routing in a method that uses it? Even if results of the
computationswill be wrong it would be interesting to see ifthe methodworks without a crash.

2.Do you have a separate test for the sorting routing? You need to verify that it doesn't havesome
internal problems.

Best regards,
Sergey

jimdempseyatthecove · ‎03-09-2012

>>The same problem can be solved with the method which doesn't use sorting routine, but it fails with all the methods that use sorting routine.

It may be time for you to take a detour and to try to isolate the cause of the problem. Yes, this is more work, but current diagnostic attempts have not produced a solution.

You have a situation where using SORT causes a problem where non-SORT methods do not. Let's isolate the SORT which should be relatively easy to do.

Replace the SORT with a subroutine that

writes data to file
runs seperate process to sort file
reads sorted data from file

I know that this is not what you want to run, but it may help you isolate the problem.

You know from running your sketch code with random data that this did not get Stack Overflow.
The above, with modification to your sketch code to read data from file and write to file, will verify if the problem is data dependent. IOW does the sort fail in the sketch code depending on values to be sorted?

a) If the sort fails at some time during the run then the problem is a data dependency with the sort code.
b) If the solver program runs to completion, then there may be a resource issue (or other unknown problem)
c) If the solver program fails, then you may have a data dependency issue with respect to convergence .OR. other programming error (uninitialized data, unallocated data, corruption of data).

Jim Dempsey