The pipeline is responsible

jaredsf · ‎03-30-2015

Hi,

We are occasionally experiencing a crash using tbb:parallel_pipeline that I'm hoping someone can help me narrow down. Any help, or suggestions for additional areas to check, would be greatly appreciated.

#0 0x00007f74b6311425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f74b6314b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f74b6c0cb05 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f74b6c0ac76 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f74b6c0aca3 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007f74b6c0b77f in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007f74ba018672 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f7478c18100, parent=..., child=<optimized out>)
at ../../src/tbb/custom_scheduler.h:455
#7 0x00007f74ba014356 in tbb::internal::arena::process (this=0x7f5a1adf0080, s=...) at ../../src/tbb/arena.cpp:106
#8 0x00007f74ba013a7b in tbb::internal::market::process (this=0x7f74b2cf1b00, j=...) at ../../src/tbb/market.cpp:479
#9 0x00007f74ba00fa0f in tbb::internal::rml::private_worker::run (this=0x7f74af81af00) at ../../src/tbb/private_server.cpp:283
#10 0x00007f74ba00fc09 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:240
#11 0x00007f74b8426e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f74b63ceccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x0000000000000000 in ?? ()

The crash is happening when trying to invoke a pure virtual function execute() on a task pointer:

customer_scheduler.h:455:
t_next = t->execute();

We run this pipeline with 4 outstanding tasks and 4 filters. The first and last filters are very fast, the second filter is the slowest, and the third filter is about 1/5th the second filter.

tbb::parallel_pipeline(4,
     tbb::make_filter<void, long>(tbb::filter::serial_in_order,
        [&] (tbb::flow_control& fc) -> long
        {...}
        & tbb::make_filter<long, long>(tbb::filter::parallel,
            [&] (long& offset) -> long
        {...}
        & tbb::make_filter<long, long>(tbb::filter::parallel,
            [&] (long& r) -> long
        {...}
        & tbb::make_filter<long, void>(tbb::filter::serial_out_of_order,
        [&] (long& count)
        {...}
     );

About 5 million elements are generated from the first pipeline stage. We've noticed that each time the crash happens, it is always with 4 elements left in the pipeline-- 1 waiting to execute on 3rd stage, 3 waiting to enter 4th stage. We are of course critically interrogating our filter code, but this common theme of 4 elements remaining lead us to suspect the pipeline.

We are running tbb 4.2. We have not seen this on 4.3, but we also don't consider our current testing to date on 4.3 conclusive to say that we won't see it on 4.3 in the future.

Below are two additional stacktraces for non-idle tbb threads at this time:

This thread appears to have just finished a task:

#0  0x00007f74b63caee9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f74ba00f888 in futex_wakeup_one (futex=0x7f74af81a1ac) at ../../include/tbb/machine/linux_common.h:77
#2  V (this=0x7f74af81a1ac) at ../../src/tbb/semaphore.h:225
#3  notify (this=0x7f74af81a1a0) at ../../src/rml/include/../server/thread_monitor.h:250
#4  wake_or_launch (this=0x7f74af81a180) at ../../src/tbb/private_server.cpp:322
#5  tbb::internal::rml::private_server::wake_some (this=<optimized out>, additional_slack=<optimized out>, additional_slack@entry=0) at ../../src/tbb/private_server.cpp:401
#6  0x00007f74ba00fb88 in propagate_chain_reaction (this=<optimized out>) at ../../src/tbb/private_server.cpp:174
#7  tbb::internal::rml::private_worker::run (this=0x7f74af81ac80) at ../../src/tbb/private_server.cpp:291
#8  0x00007f74ba00fc09 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:240
#9  0x00007f74b8426e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f74b63ceccd in clone () from /lib/x86_64-linux-gnu/libc.so.6

This is the thread calling tbb::parallel_pipeline:

#0  0x00007f74ba00b7f9 in tbb::internal::stage_task::execute (this=0x7f73e88189c0) at ../../src/tbb/pipeline.cpp:363
#1  0x00007f74ba018672 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f5a1addc100, parent=..., child=<optimized out>)
    at ../../src/tbb/custom_scheduler.h:455
#2  0x00007f74ba016ad0 in tbb::internal::generic_scheduler::local_spawn_root_and_wait (this=0x7f5a1addc100, first=..., next=@0x7f5a1add6838: 0x7f5a1add69c0) at ../../src/tbb/scheduler.cpp:668
#3  0x00007f74ba00c621 in spawn_root_and_wait (root=...) at ../../include/tbb/task.h:705
#4  tbb::pipeline::run (this=this@entry=0x7f591efde000, max_number_of_live_tokens=max_number_of_live_tokens@entry=4, context=...) at ../../src/tbb/pipeline.cpp:666
#5  0x0000000000ba48a9 in parallel_pipeline (context=..., filter_chain=..., max_number_of_live_tokens=4) at /opt/sfdev-6.28/include/tbb/pipeline.h:654
#6  parallel_pipeline (filter_chain=..., max_number_of_live_tokens=4) at /opt/sfdev-6.28/include/tbb/pipeline.h:660

Thanks!

Jared

jiri · ‎03-31-2015

The parallel pipeline is the only TBB thing running? In my work with TBB so far, the "pure virtual function" error tended to be a result of mismanagement of tasks, either executing a destroyed task or (much more common) getting a single task to be spawned twice, which is in fact the same as the first case, since spawn destroys the task (unless it is recycled).

jaredsf · ‎03-31-2015

@jiri, We use several tbb data structures (e.g. concurrent_queue, atomic), and parallel_pipeline in a couple of places. This was the only parallel_pipeline in use anywhere near the time of the crash. You mentioned common causes stemming from executing a destroyed task, but is that something that I can cause by simply using parallel_pipeline?

jiri · ‎04-01-2015

The pipeline is responsible for correct management of the tasks it uses, so if you only run the parallel_pipeline function, you should not be able to make this happen. The concurrent data structures and atomics are independent on the task scheduling mechanism, so this should also not be the cause.

RafSchietekat · ‎04-02-2015

What is the evidence of a task waiting to execute on the 3rd stage? It's a parallel stage, so the task should be able to go directly from 2nd to 3rd stage, from my understanding of the current implementation. The reason for that is the attempt to strike the iron when it's hot (in cache), by having threads only "abandon" a task in a stage queue if that stage is currently busy with something else (which only applies to serial stages).

I would agree that TBB is responsible for properly managing tasks from its own prepackaged algorithms. But just to be sure, is there any explicit use of tasks anywhere in the program (perhaps enqueued), or can we definitely rule that out?

jaredsf · ‎04-02-2015

I added counters into each of my stages and was lucky enough to get a recreate, which allowed me to see that stage 2 was executed the expected number of times, but stage 3 was one short.

Definitely no use of tasks anywhere in the program. We only use parallel_pipeline in a couple places, and a parallel_for, and that's all.

I'm still digging into the core file, but lack of familiarity with the tbb internals is making it difficult to know what to look at to see what might be happening.

RafSchietekat · ‎04-02-2015

Very strange. I don't see any relevant differences between 4.2 and 4.3 pipeline code (I might be off by some updates), but that's all I looked at this time. I would still recommend to also involve 4.3 update 4 (the latest stable release) for more relevance and attention.

custom_scheduler calling pure virtual function task::execute()