Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

parallel_sort out of bounds

Shant_H_
Beginner
309 Views

I'm using tbb::parallel_sort to sort a vector of about 300,000 elements. This operation happens quite often across several machines and every now and then I get a segfault. I have a vector<Row*> vr and provide my own comparator RowLessThan to parallel_sort to deal with the Row*. I call parallel_sort(std::begin(vr), std::end(vr), RowLessThan()).

I have tracked down segfault to my comparator being called with one of its arguments called with std::end(vr). Since std::end(vt) points past the end of the vector and it should not be dereferenced. Is it possible that the parallel_sort implementation  when partitioning/pivoting compare values with the end iterator?

Linux GCC 4.6.3 TBB 4.2

Thanks!

Here's a back trace:

The segfault happens when I deference lhs which is a giant address in this case 0x3039366131613562.

#9  <signal handler called>
#10 project::Match::RowLessThan::operator() (this=0x7fd2b1a26980, lhs=@0x2a6b1430: 0x3039366131613562, rhs=@0x2a6b12e0: 0x2aae6a50) at
#11 0x0000000000931fc2 in __unguarded_partition<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*> >, project::Row*, project::Match::RowLessThan> (__comp=...,
    __pivot=@0x2a6b12e0: 0x2aae6a50, __last=..., __first=...) at /usr/include/c++/4.6/bits/stl_algo.h:2233
#12 __unguarded_partition_pivot<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*> >, project::Match::RowLessThan> (__comp=..., __last=..., __first=...)
        at /usr/include/c++/4.6/bits/stl_algo.h:2265
#13 std::__introsort_loop<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*, std::allocator<project::Row*> > >, long, project::Match::RowLessThan> (__first=...,
            __last=..., __depth_limit=13, __comp=...) at /usr/include/c++/4.6/bits/stl_algo.h:2306
#14 0x0000000000932043 in std::__introsort_loop<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*, std::allocator<project::Row*> > >, long, project::Match::RowLessThan> (__first=..., __last=..., __depth_limit=14, __comp=...) at /usr/include/c++/4.6/bits/stl_algo.h:2307
#15 0x0000000000932043 in std::__introsort_loop<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*, std::allocator<project::Row*> > >, long, project::Match::RowLessThan> (__first=..., __last=..., __depth_limit=15, __comp=...) at /usr/include/c++/4.6/bits/stl_algo.h:2307
#16 0x000000000093336c in sort<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*> >, project::Match::RowLessThan> (__comp=..., __last=..., __first=...) at /usr/include/c++/4.6/bits/stl_algo.h:5445
#17 operator() (range=..., this=<optimized out>) at
#18 run_body (r=..., this=<optimized out>) at
#19 tbb::interface6::internal::partition_type_base<tbb::interface6::internal::auto_partition_type>::execute<tbb::interface6::internal::start_for<tbb::internal::quick_sort_range<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*, std::allocator<project::Row*> > >, project::Match::RowLessThan>, tbb::internal::quick_sort_body<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*, std::allocator<project::Row*> > >, project::Match::RowLessThan>, tbb::auto_partitioner const>, tbb::internal::quick_sort_range<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*, std::allocator<project::Row*> > >, project::Match::RowLessThan> > (this=0x3d4bd68, start=..., range=...) at
#20 0x0000000000933539 in tbb::interface6::internal::start_for<tbb::internal::quick_sort_range<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*, std::allocator<project::Row*> > >, project::Match::RowLessThan>, tbb::internal::quick_sort_body<__gnu_cxx::__normal_iterator<project::Row**, std::vector<project::Row*, std::allocator<project::Row*> > >, project::Match::RowLessThan>, tbb::auto_partitioner const>::execute (this=<optimized out>) at
#21 0x00007fd2dda8e29a in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x3a47b80, parent=..., child=<optimized out>) at
#22 0x00007fd2dda89f76 in tbb::internal::arena::process (this=0x3b3c280, s=...) at
#23 0x00007fd2dda8966b in tbb::internal::market::process (this=0x5927900, j=...) at
#24 0x00007fd2dda8528f in tbb::internal::rml::private_worker::run (this=0x1bbb5100) at
#25 0x00007fd2dda854b9 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at
#26 0x00007fd2de6f9e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#27 0x00007fd2dc90eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6

0 Kudos
2 Replies
Shant_H_
Beginner
309 Views

Ok problem solved, of course my comparator wasn't doing a proper strict weak ordering.

0 Kudos
RafSchietekat
Valued Contributor III
309 Views

parallel_sort() should not read past the end, as long as the comparison function always returns false for identical arguments (irreflexivity is one of the required properties for the comparison function). Use a debug build to enable the assert statements and get more information from the stack trace.

(Added 2013-10-21) And by "should not" I mean "won't". Note that it is certainly possible to sort with a comparison function that returns even an undefined result for equivalent arguments, but irreflexivity is irrevocably part of the contract (if somebody can reference evidence for a strict connection between irreflexivity and performance, or about whether a ternary result would or wouldn't be even better, pray tell), so you cannot fault TBB for relying on it (even if the performance gained by not using a more robust implementation is questionable). Also note that I used "identical" rather than "equivalent" above because it is a weaker requirement, strictly speaking.

(Added 2013-10-22) I suppose that the preceding "problem solved" posting was delayed in moderation, because I only just noticed it...

0 Kudos
Reply