Community
cancel
Showing results for 
Search instead for 
Did you mean: 
156 Views

Stack Overflow

I'm porting an app to TBB. Before I started, everything worked well, but now I'm having problems with stack overflow. How do I go about tracking this down? (The call stack window in VS2008 shows a list of calls, but warns me that "Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll".)

0 Kudos
10 Replies
156 Views

Quoting - dr_eck
(The call stack window in VS2008 shows a list of calls, but warns me that "Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll".)

You can use symbol server to track such libraries.

Alexey_K_Intel3
Employee
156 Views

Right, search for "symbol server" at MSDN and you will figure out how to track the stack through ntdll and other such libraries.
If you will find out that your application is running out of stack on a TBB worker thread, you might then specify the required amount of stack as a parameter to the constructor of TBB's task_scheduler_init object.
Dmitry_Vyukov
Valued Contributor I
156 Views

Quoting - dr_eck

I'm porting an app to TBB. Before I started, everything worked well, but now I'm having problems with stack overflow. How do I go about tracking this down? (The call stack window in VS2008 shows a list of calls, but warns me that "Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll".)

Is stack overflow happen on the TBB's worker thread?

Is there your code or TBB code in the stack besides ntdll code? If yes, what code?

Are you using blocking wait for children or continuations? Blocking wait for children DO can cause stack-overflow indeed.

156 Views

Quoting - Dmitriy V'jukov

Is stack overflow happen on the TBB's worker thread?

Is there your code or TBB code in the stack besides ntdll code? If yes, what code?

Are you using blocking wait for children or continuations? Blocking wait for children DO can cause stack-overflow indeed.

I appreciate your suggestions, but I'm way over my depth here. I was hoping that TBB would enable me, a high level scientific programmer, to multithread my program. I ccouldn't figure out how to run SymSrv soI downloaded all of the symbols for WinXP and installed them. Here's the new stack trace.

ntdll.dll!_KiFastSystemCallRet@0()
ntdll.dll!_NtWaitForSingleObject@12() + 0xc bytes
kernel32.dll!_WaitForSingleObjectEx@12() + 0x8b bytes
[Frames below may be incorrect and/or missing, no symbols loaded for mscorwks.dll]
ntdll.dll!_ZwClearEvent@4() + 0xc bytes
kernel32.dll!_ResetEvent@4() + 0xe bytes
ntdll.dll!_NtSetEvent@8() + 0xc bytes
kernel32.dll!_RaiseException@16() + 0x52 bytes
[Managed to Native Transition]
>Rat.exe!TracePacket::operator()(tbb::blocked_range2d& r = {...}) Line 35 + 0x6f bytesC++
Rat.exe!tbb::internal::start_for<:BLOCKED_RANGE2D>,TracePacket,tbb::simple_partitioner>::execute() Line 98C++

Other than mscorwks.dll, it looks as if I got all the symbols I need. Now to the questions:

>Is the stack overflowi happening on a TBB worker thread?

VS2008's debugger shows 10 threads: Main Thread, Worker Thread, , 3 threads called "Win32", _RtlpTimerThread@4, 2 threads called _RtlpWorkerThread@4, WmipEventPump@4

I believe theworker thread is there because I run the calculations with a BackgroundWorker. Both this and the thread called are associated with my source code. When the crash happens, both are pointing to lines in my function object that is being called by the parallel_for, so as far as I know, the stack overflow is happening on a TBB thread.

>Is there your code or TBB code in the stack besides the ntdll code? If yes, what code?

Here's my functor:

void operator()( const blocked_range2d& r ) const
{
for(int i=r.rows().begin(); i for(int j=r.cols().begin(); j Pnt3 pp(p + j*DI*du - i*DI*dv); // ray origin
Vec3 v( !Vec3(pos, pp) ); // ray direction
Ray ray(pp, v, ray_epsilon, (yon/(v*cam->aim)), Spectrum() );
cam->tracer->NewRay();
cam->film[j+cam->pixW*i] = cam->tracer->Radiance(ray);
}
}
}

>Are you using Blocking Wait for children or continuations?

I don't think so. It's just a parallel_for.

I should probably also mentio that the "Managed to Native Transition" is in the stack because I wrote the GUI in .Net. Yes, I have learned my lesson and will never do that again. In fact, I'll be porting to MFC as soon as my books arrive (unless someone tells me that ICC11 supports managed code).

Edit: Whatt happened to my code? The inserter garbled it, so I have repasted.

156 Views

Right, search for "symbol server" at MSDN and you will figure out how to track the stack through ntdll and other such libraries.
If you will find out that your application is running out of stack on a TBB worker thread, you might then specify the required amount of stack as a parameter to the constructor of TBB's task_scheduler_init object.

Thanks for the pointer on symbol server. I couldn't figure out how to make it work from the MS documentation, but at least I was able to find a link to download most of the required symbols.

As far as setting the amount of stack, I was able to find that in the doc's, but need a little more help. What's the default stack size? (The default parameter is 0, which is probably not enough stack for anything.) The last time I had to worry about stack size was on a 16 bit OS, so I have no clue as to what is appropriate now.

Alexey_K_Intel3
Employee
156 Views

Quoting - dr_eck
Here's my functor:

void operator()( const blocked_range2d& r ) const
{
for(int i=r.rows().begin(); i for(int j=r.cols().begin(); j Pnt3 pp(p + j*DI*du - i*DI*dv); // ray origin
Vec3 v( !Vec3(pos, pp) ); // ray direction
Ray ray(pp, v, ray_epsilon, (yon/(v*cam->aim)), Spectrum() );
cam->tracer->NewRay();
cam->film[j+cam->pixW*i] = cam->tracer->Radiance(ray);
}
}
}

How big are these objects created on stack, could you possibly estimate?

By default, TBB sets the stack size for worker threads to 2M on 32-bit Windows, and 4M on 64-bit Windows. I suggest you to run a series of experiments increasing the stack size for TBBworkers twice each time, until either everything works ok or the size is approaching some unreasonably big value. The latter would indicate that either the problem is not in stack overflow (but possibly in stack corruption) or the required stack is so big that its use should be reduced anyway. I would stop at 128 or 256M probably.

By the way, try initializing TBB to run in serial (explicitly passing 1 as the first parameter to task_scheduler_init constructor). In this case, TBB would not create any threads but only uses the calling thread to run parallel_for. Does the stack issue exist in this case as well?

Alexey_K_Intel3
Employee
156 Views

The other question is, how do you call the TBB code from the managed code? You might need to employ some marshalling there.
Andrey_Marochko
New Contributor III
156 Views

The stack dump doesn't look like your average stack overflow one. In case of stack overflow you normally either get infinitely repeating sequences of the same calls, or your process silently vanishes without any traces. Did you have any other output (in the Output pane, from debugger or C or .Net runtimes) that said that it was stack overflow?

To my eye the problem looks like the result of memory corruption that in particular may be a consequence of a data race. Are all the methods being invoked from the parallel_for body thread safe? E.g. cam->tracer->NewRay(), cam->tracer->Radiance, Ray class constructor (if it uses any global data)?

Dmitry_Vyukov
Valued Contributor I
156 Views

Quoting - dr_eck

void operator()( const blocked_range2d& r ) const
{
for(int i=r.rows().begin(); i for(int j=r.cols().begin(); j Pnt3 pp(p + j*DI*du - i*DI*dv); // ray origin
Vec3 v( !Vec3(pos, pp) ); // ray direction
Ray ray(pp, v, ray_epsilon, (yon/(v*cam->aim)), Spectrum() );
cam->tracer->NewRay();
cam->film[j+cam->pixW*i] = cam->tracer->Radiance(ray);
}
}
}

Also check whether you are using _alloca() function or not. It can eat up stack in two ways:

(1) you just allocate too much memory with _alloca()

(2) _alloca() sometimes doesn't return memory on function end, so if you allocate memory with _alloca() in some function called from cycle, then it can allocate additional memory on every iteration w/o returning it back

Dmitry_Vyukov
Valued Contributor I
156 Views

Quoting - dr_eck

void operator()( const blocked_range2d& r ) const
{
for(int i=r.rows().begin(); i for(int j=r.cols().begin(); j Pnt3 pp(p + j*DI*du - i*DI*dv); // ray origin
Vec3 v( !Vec3(pos, pp) ); // ray direction
Ray ray(pp, v, ray_epsilon, (yon/(v*cam->aim)), Spectrum() );
cam->tracer->NewRay();
cam->film[j+cam->pixW*i] = cam->tracer->Radiance(ray);
}
}
}

You can try to use following function to track down the problem:

[cpp]#include 
#include 
#include 
#include 

__declspec(noinline) void output_stack_consumption(char const* name)
{
    unsigned long stack_base = __readfsdword(4);
    std::vector buf (1024);
    sprintf(&buf[0], "thread #%u, [%s], stack consumption: %un", GetCurrentThreadId(), name, stack_base - (unsigned long)&buf);
    OutputDebugStringA(&buf[0]);
}
[/cpp]

Here is usage example:

[cpp]void recursion_test(int x)
{
    output_stack_consumption("recursion_test");
    char buf [100];
    if (x)
        recursion_test(x - 1);
}

int main()
{
    output_stack_consumption("main");
    recursion_test(10);
    output_stack_consumption("main");
}
[/cpp]

It outputs (to 'Output' window of MSVC):

thread #2976, [main], stack consumption: 416
thread #2976, [recursion_test], stack consumption: 752
thread #2976, [recursion_test], stack consumption: 1088
thread #2976, [recursion_test], stack consumption: 1424
thread #2976, [recursion_test], stack consumption: 1760
thread #2976, [recursion_test], stack consumption: 2096
thread #2976, [recursion_test], stack consumption: 2432
thread #2976, [recursion_test], stack consumption: 2768
thread #2976, [recursion_test], stack consumption: 3104
thread #2976, [recursion_test], stack consumption: 3440
thread #2976, [recursion_test], stack consumption: 3776
thread #2976, [recursion_test], stack consumption: 4112
thread #2976, [main], stack consumption: 416

Reply