Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

tbb:task has its own context?

Oliver_K_
Beginner
2,028 Views

Does a tbb::task have its own context, e.g. its own stack like coroutines or fibers?

I try to evaluate if it is possible to write code similiar to GO-routines with TBB, e.g. each task has it's own stack, can be suspended iit's current stackframe and resumed later (maybe while waiting for some external event).
 

0 Kudos
27 Replies
Oliver_K_
Beginner
370 Views

Dmitry Vyukov wrote:

This will take in eternity in C/C++ world. What if you use TBB and a proprietary driver to access a proprietary DB; next release of TBB switches to segmented stacks and the DB driver starts crashing... there is not much you can do. The situation is pretty much unfixable for C/C++ world as a whole. That's not to say that it's impossible to switch a particular isolated project to segmented stacks.

Do you have code demonstrating that using segmented stacks will crash an applications?

I'm wondering because I've a small test app recursively allocating an array on the stack. With a fixed size stack (having an guard page at its bottom) I get an segmentation fault after view iterations. When the app uses an segmented stack with an initial stack size of 4kB, it does never crash (finishes 500 iterations).

int count = 500;

void access( char *buf) __attribute__ ((noinline));
void access( char *buf)
{
  buf[0] = '\0';
}

void bar( int i)
{
    char buf[4 * 1024];

    if ( i > 0)
    {
        access( buf);
        std::cout << i << ". iteration" << std::endl;
        bar( i - 1);
    }
}

 

bar is called from the special context.

 

0 Kudos
Oliver_K_
Beginner
370 Views

Dmitry Vyukov wrote:

> TBB for Go… :-)

I've actually done it. Several times.

Basically you can implement a "user-space" scheduler on top of goroutines. Create NumCPU goroutines, each with a light-weight work-stealing deque. Then distribute work between the goroutines manually.

It's also wrap-able in a nice way (thanks to closures):

A, B, C := make([]float64, N), make([]float64, N), make([]float64, N)
parallel.For(func(i int) {
    A = B * C
})

[sorry the code is highlighted as C++, IDZ does not support Go]

Is it really an equivalent? In the previous postings I got told that tbb does not support preserving CPU registers + stack pointer, e.g. you can't jump out tbb::task::execute() the leave the stack frame intact. Local data (for instance loop counters) are not preserved - even if asynch. completion task is used. Or is my understanding of tbb false.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
370 Views

Oliver K. wrote:

Quote:

Dmitry Vyukov wrote:

1. Segment switch has noticeable cost, and if it happens inside of an inner loop, it can have very negative effect on performance. You can easily get 10x performance hit, if you are unlucky. I guess that Intel processors has some hardware support for stacks (manipulating SP pointer and accessing memory through SP pointer, e.g. PUSH/POP/CALL/RET), and that support assumes that the  stack is continuous. So changing SP "on-the-fly" causes performance hit.

 

context jumping on x86_64/Q6700 + SYSV/ELF ABI using:

- fixed size stack: the jump consumes 11ns (storing/restoring CPU register + stack pointer)

- segmented stack: the jump consumes 75ns  (of curse some additional functions have to be called)

 

How have you measured? I would expect a plain function call to take <11ns.

 

Unfortunately, in C/C++, since a program can have references by address to things on the stack, moving is a nonstarter.

Exactly!

0 Kudos
Dmitry_Vyukov
Valued Contributor I
370 Views

Oliver K. wrote:

Quote:

Dmitry Vyukov wrote:

This will take in eternity in C/C++ world. What if you use TBB and a proprietary driver to access a proprietary DB; next release of TBB switches to segmented stacks and the DB driver starts crashing... there is not much you can do. The situation is pretty much unfixable for C/C++ world as a whole. That's not to say that it's impossible to switch a particular isolated project to segmented stacks.

 

Do you have code demonstrating that using segmented stacks will crash an applications?

I'm wondering because I've a small test app recursively allocating an array on the stack. With a fixed size stack (having an guard page at its bottom) I get an segmentation fault after view iterations. When the app uses an segmented stack with an initial stack size of 4kB, it does never crash (finishes 500 iterations).

int count = 500;

void access( char *buf) __attribute__ ((noinline));
void access( char *buf)
{
  buf[0] = '\0';
}

void bar( int i)
{
    char buf[4 * 1024];

    if ( i > 0)
    {
        access( buf);
        std::cout << i << ". iteration" << std::endl;
        bar( i - 1);
    }
}

 

bar is called from the special context.

 

 

Yes, see #12. It's all based on our experience deploying a similar stack perturbing technology in a huge C/C++ code base.

I meant crashes because there are lots of code out there that simply assumes that stack is continuous, rather than because of stack overflows. 

 

 

0 Kudos
Dmitry_Vyukov
Valued Contributor I
370 Views

Oliver K. wrote:

Quote:

Dmitry Vyukov wrote:

> TBB for Go… :-)

I've actually done it. Several times.

Basically you can implement a "user-space" scheduler on top of goroutines. Create NumCPU goroutines, each with a light-weight work-stealing deque. Then distribute work between the goroutines manually.

It's also wrap-able in a nice way (thanks to closures):

A, B, C := make([]float64, N), make([]float64, N), make([]float64, N)
parallel.For(func(i int) {
    A = B * C
})

[sorry the code is highlighted as C++, IDZ does not support Go]

 

Is it really an equivalent? In the previous postings I got told that tbb does not support preserving CPU registers + stack pointer, e.g. you can't jump out tbb::task::execute() the leave the stack frame intact. Local data (for instance loop counters) are not preserved - even if asynch. completion task is used. Or is my understanding of tbb false.

 

It is equivalent with respect to scheduling order. Raf asked about characteristics of Go scheduler. I've said it's possible to mimic TBB scheduling order in Go.

It may be not equivalent with respect to other characteristics.

> you can't jump out tbb::task::execute() the leave the stack frame intact

I believe this is true.

 

0 Kudos
Oliver_K_
Beginner
370 Views

Dmitry Vyukov wrote:

Yes, see #12. It's all based on our experience deploying a similar stack perturbing technology in a huge C/C++ code base.

I meant crashes because there are lots of code out there that simply assumes that stack is continuous, rather than because of stack overflows.

Dmitry, can you provide me an code example which crashes if  GCC's splitstacks are used, please?! (because your posting is the first one I've read which tells that segmented stacks will cause segmentation faults).

Dmitry Vyukov wrote:

Software that assumes that stacks are continuos and that stack pointer moves in one direction. This includes profiling and debugging libraries (stack unwinding), garbage collectors/leak detectors (usually assume that stack is something that can be described by pointer to beginning and size) and various systems/hacky software (e.g. measure library call stack consumption by subtracting 2 stack pointers).

As far as I remember it was Vim that compares 2 stack pointers to determine direction in which stack grows. Don't ask me why a text editor needs this.

Until now I assumed that the split-stacks of GCC do grow downwards on x86 - is this not true?

Does not the ABI (or the architecture) determine in which direction the stack grows? For instance the ARM architecture allows that the stack can expand to higher or lower addresses (depending on the mode) but the AAPCS ABI requires the stack to grow to lower addresses.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
370 Views

Oliver K. wrote:

Quote:

Dmitry Vyukov wrote:

Yes, see #12. It's all based on our experience deploying a similar stack perturbing technology in a huge C/C++ code base.

I meant crashes because there are lots of code out there that simply assumes that stack is continuous, rather than because of stack overflows.

 

Dmitry, can you provide me an code example which crashes if  GCC's splitstacks are used, please?! (because your posting is the first one I've read which tells that segmented stacks will cause segmentation faults).

I have not been collecting all examples. But here is a one -- Oilpan (garbage collector for C++) assumes that thread's stack is a single continuos region of memory. With AddressSantizer detect_use_after_return option stacks become fragmented, this blows up Oilpan:

https://code.google.com/p/chromium/issues/detail?id=339813

Here is another example:

http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/sanitizer_common/sanitizer_stacktrace.cc?view=markup

If you look at the StackTrace::FastUnwindStack function, you will notice that it can't possibly work correctly with segmented stacks.

Oliver K. wrote:

Quote:

Dmitry Vyukov wrote:

Software that assumes that stacks are continuos and that stack pointer moves in one direction. This includes profiling and debugging libraries (stack unwinding), garbage collectors/leak detectors (usually assume that stack is something that can be described by pointer to beginning and size) and various systems/hacky software (e.g. measure library call stack consumption by subtracting 2 stack pointers).

As far as I remember it was Vim that compares 2 stack pointers to determine direction in which stack grows. Don't ask me why a text editor needs this.

 

Until now I assumed that the split-stacks of GCC do grow downwards on x86 - is this not true?

Does not the ABI (or the architecture) determine in which direction the stack grows? For instance the ARM architecture allows that the stack can expand to higher or lower addresses (depending on the mode) but the AAPCS ABI requires the stack to grow to lower addresses.

This is true.

I meant the following situation: if your program directly or indirectly depends on any library that simply compares 2 stack pointers to determine direction of stack growth; your program will misbehave with segmented stacks (because the library can make the wrong conclusion about stack growth direction if the 2 stack pointers happen to be in different stack segments).

 

0 Kudos
Reply