Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Crash when using DYLD_INSERT_LIBRARIES=libtbbmalloc_proxy.dylib

Luke_W_
Beginner
1,407 Views

I am trying to run a Qt application on Mac OS X with libtbbmalloc_proxy. Occasionally the application runs correctly, but most of the time it crashes on startup with the following stack trace:

#0	0x00000001041f789b in bool rml::internal::isLargeObject<(rml::internal::MemoryOrigin)1>(void*) ()
#1	0x00000001041f7382 in __TBB_malloc_safer_msize ()
#2	0x00007fff88f017c8 in free ()
#3	0x00007fff96d6cca9 in NXMapInsert ()
#4	0x00007fff96d83051 in __sel_registerName(char const*, int, int) ()
#5	0x00007fff96d6b5cf in _read_images ()
#6	0x00007fff96d6a447 in map_images_nolock ()
#7	0x00007fff96d69ec9 in map_images ()
#8	0x00007fff5fc049dd in dyld::notifyBatchPartial(dyld_image_states, bool, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*)) ()
#9	0x00007fff5fc0e763 in ImageLoader::link(ImageLoader::LinkContext const&, bool, bool, bool, ImageLoader::RPathChain const&) ()
#10	0x00007fff5fc04be4 in dyld::link(ImageLoader*, bool, bool, ImageLoader::RPathChain const&) ()
#11	0x00007fff5fc0c168 in dlopen ()
#12	0x00007fff8bd45857 in dlopen ()
#13	0x0000000102b730af in QLibraryPrivate::load_sys() ()
#14	0x0000000102b6d8b2 in QLibraryPrivate::load() ()
#15	0x0000000102b6dea7 in QLibraryPrivate::loadPlugin() ()
#16	0x0000000102b68c38 in QFactoryLoader::instance(int) const ()
#17	0x000000010237b42d in QPlatformIntegrationFactory::create(QString const&, QStringList const&, int&, char**, QString const&) ()
#18	0x0000000102387172 in QGuiApplicationPrivate::createPlatformIntegration() ()
#19	0x000000010238808b in QGuiApplicationPrivate::createEventDispatcher() ()
#20	0x0000000102b77c81 in QCoreApplication::init() ()
#21	0x0000000102b77bf7 in QCoreApplication::QCoreApplication(QCoreApplicationPrivate&) ()
#22	0x000000010238554e in QGuiApplication::QGuiApplication(QGuiApplicationPrivate&) ()
#23	0x0000000101cc505e in QApplication::QApplication(int&, char**, int) ()

I am using tbb43_20141023oss_src.tgz, but I have seen the same problem with tbb43_20150316oss_src.tgz as well.

 

0 Kudos
16 Replies
RafSchietekat
Valued Contributor III
1,407 Views

For a more complete picture, do you confirm that the program always runs correctly without the proxy (and, if so, perhaps also when it is linked with the proxy library at build time)? Does it also happen with the debug version, and does it give more information like line number and some relevant variable values?

That said, I am suspicious of the implementation of safer_dereference(): it catches a memory access violation on Windows, but Unix-like operating systems don't throw an exception and instead signal a bus error, so what if you free memory early on a memory page, and the proxy goes on to check ((LargeObjectHdr*)object - 1)->backRefIdx?

0 Kudos
Alexandr_K_Intel1
1,407 Views
Luke, In addition to Raf’s advices, may I ask you to try it with libtbbmalloc_proxy_debug.dylib? In addition, could you report an output of the application with TBB_VERSION=1 environment variable to exclude mess with versions? Raf, The problem with safe_dereference is definitely possible. In all previous times, we were lucky though.
0 Kudos
Luke_W_
Beginner
1,407 Views

Yes, the program always runs fine without using tbbmalloc_proxy. I get exactly the same stack trace if I use -ltbbmalloc_proxy instead of the DYLD_INSERT_LIBRARIES method.

With TBB_VERSION=1, I get this output:

TBBmalloc: VERSION		4.3
TBBmalloc: INTERFACE VERSION	8004
TBBmalloc: BUILD_DATE		Thu 16 Apr 2015 20:40:56 UTC
TBBmalloc: BUILD_HOST		ravel (i386)
TBBmalloc: BUILD_OS		Mac OS X version 10.10.3
TBBmalloc: BUILD_KERNEL	Darwin Kernel Version 14.3.0: Mon Mar 23 11:59:05 PDT 2015; root:xnu-2782.20.48~5/RELEASE_X86_64
TBBmalloc: BUILD_CLANG	Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)
TBBmalloc: BUILD_XCODE	Xcode 6.3
TBBmalloc: BUILD_TARGET	intel64 on cc4.2.1_os10.10.3
TBBmalloc: BUILD_COMMAND	clang++ -g -O0 -DTBB_USE_DEBUG -DUSE_PTHREAD -m64 -mrtm -fPIC -D__TBB_BUILD=1 -Wall -Wno-non-virtual-dtor -Wno-dangling-else -I../../src -I../../src/rml/include -I../../include -I.
TBBmalloc: TBB_USE_DEBUG	1
TBBmalloc: TBB_USE_ASSERT	1
TBBmalloc: DO_ITT_NOTIFY	undefined
TBBmalloc: huge pages	not requested

And using the debug version I get much more information:

  • impl_malloc_usable_size called with ptr=0x106f00000
  • ...calls __TBB_malloc_safer_msize
  • ...calls isRecognized
  • ...calls isLargeObject
  • ...calls safer_dereference with ptr=0x106effff8
  • ...which throws EXC_BAD_ACCESS when trying to access *ptr
0 Kudos
RafSchietekat
Valued Contributor III
1,407 Views

OK, that confirms it. However, "throws EXC_BAD_ACCESS" makes it look like this is a language-level exception, whereas I think that it's a Mach (microkernel) exception, which would then get expressed as a signal. Just out of curiosity, could you show the original diagnostic, instead?

I haven't done any Mach programming since before it arrived on Mac OS, though. And it would probably be better to handle this as a signal rather than the original Mach exception, so that the fix is more broadly applicable, even though signals are scary too. Inquiring in advance using mincore() could confirm that there's light at the end of the tunnel, but would probably be too expensive as a solution (kernel-call expensive?).

(Added) What happens if you replace safer_dereference() in src/tbbmalloc/frontend.cpp with the following provisional code:

static inline BackRefIdx safer_dereference (const BackRefIdx *ptr)
{
    BackRefIdx id;
#if _MSC_VER
    __try {
        id = *ptr;
    } __except( GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION?
                EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH ) {
        id = BackRefIdx();
    }
#else
    typedef void (*sig_t) (int);
    const sig_t old = signal(SIGBUS, SIG_IGN); // yes, I know, but it's just to diagnose the problem
    id = *ptr;
    (void) signal(SIGBUS, old); // hopelessly naive
#endif
    return id;
}

(Added) Unfortunately, signal support seems woefully inadequate. The signal will probably be delivered on the current thread, but the only thread-specific setting I found is pthread_sigmask(2), and that only seems to block the signal (so on unblocking the pending signal still gets delivered, I suppose, correct me if I'm mistaken), plus it's also an expensive-looking system call. Maybe the Mach approach really is the only possible one, but how complicated (only Mac OS X benefits, if it's at all possible) and expensive (with viability at stake) would that be?

0 Kudos
Luke_W_
Beginner
1,407 Views

To summarise, I did try with the change you proposed, and (at least in the Xcode debugger) my program still stopped on the *ptr line. FWIW, even when my program does manage to run past startup, the scenario I was profiling ran slower with tbbmalloc_proxy enabled. Thus, I doubt it would benefit my project even if this bug was fixed. Is there somewhere a performance benchmark comparison between the Mac OS X native allocation and the same with TBB enabled? The only reason I'm trying to make it work is because tbbmalloc_proxy makes our application an order of magnitude faster on Windows, but now I'm thinking it might not have the same effect on Mac.

0 Kudos
RafSchietekat
Valued Contributor III
1,407 Views

Sorry, it's probably SIGSEGV rather than SIGBUS. Could you try that, instead?

Maybe it's best to admit defeat and just use a global handler, instead. It would then have to be documented clearly enough that the application programmer is always aware of the fact, and it might inhibit interoperability with other libraries or with anything the application programmer might want to do with this signal, but at least it's a plausible solution. Unless somebody else knows a transparent way of dealing with this, after all?

Of course, now we also have the report that the program runs slower with the proxy. Or is that just while profiling? If it's only with the diagnostic change I suggested, then that would not be an issue if a global handler were used, because it would only be set once, at the beginning.

0 Kudos
Luke_W_
Beginner
1,407 Views

With SIGSEGV instead of SIGBUS my debugger still stops with EXC_BAD_ACCESS on *ptr.

When I said "profiling", I meant that I was running a standard build and using a stopwatch to time a certain procedure. That was before I first started this forum thread, and thus before I tried the modified code. Normally the procedure took about 2m24s, and with tbbmalloc_proxy it took about 3m. Obviously it isn't a rigorous benchmark, but it clearly demonstrates that on Mac, we don't get the 6x speedup we witness with the Windows version.

I think it would be useful for you to have a graph of benchmarks vs standard compilers, if only for marketing purposes, since the performance gain we see on Windows is very impressive.

0 Kudos
RafSchietekat
Valued Contributor III
1,407 Views

Could it be that the debugger is just set to stop on Mach exceptions, before they are handled, i.e., ignored by the program after or during transformation into a signal? What happens without the debugger? Sorry to have to ask, but I'm grasping at straws now, and you weren't explicit about it.

That performance thing is very interesting, of course. What happens if you don't use the scalable allocator at all (don't use proxy, hide libtbbmalloc(_debug).dylib from application): does performance improve (then Apple's native allocator is very impressive indeed) or worsen (then it's another problem with the proxy)?

0 Kudos
Luke_W_
Beginner
1,407 Views

Yes, I think at this point it is better if you do some experimentation of your own, as I don't know enough at the OS level to be helpful.

0 Kudos
RafSchietekat
Valued Contributor III
1,407 Views

My first question was whether, with SIGSEGV disabled, your program runs as it should, either without stopping on Mach exceptions (if you've seen a change by changing settings), or (short-circuit), without using the debugger (to be absolutely sure)?

For the second question, perhaps somebody could reproduce that, perhaps myself.

0 Kudos
Luke_W_
Beginner
1,407 Views

Sorry, but I cannot invest time in looking at this bug any more right now, especially since we see no performance gain. I would suggest that a TBB developer try to reproduce this bug and add a new unit test for it. I would also suggest running actual performance benchmarks on Mac of the native allocator and TBB, and deciding whether it is worthwhile continuing development of the Mac version.

0 Kudos
RafSchietekat
Valued Contributor III
1,407 Views

Could you at least answer my question whether it was just the debugger that stopped at the Mach exception or whether the program was still failing even when ignoring SIGSEGV? That's kind of important, and obviously this problem is not easily reproducible.

0 Kudos
Alexandr_K_Intel1
1,407 Views
Thank you for the bug report, we able to reproduce the situation. As for allocator performance, this is not an area with single winner. Performance advantage over other allocators on some benchmarks or applications doesn’t mean that for yet another application it will be so. Best way to compare allocators is to try them, lucky the interfaces are standard.
0 Kudos
Vladimir_P_1234567890
1,407 Views

hello, 

we have published the tbb44_20150928oss development release with a few improvements in this regard. Could you check whether the problem has been fixed for your case?

https://www.threadingbuildingblocks.org/download#development-releases

--Vladimir

0 Kudos
Vladimir_P_1234567890
1,407 Views

right, we have added some heuristics to reduce a possibility of such signals.

--Vladimir 

0 Kudos
Demeter__Zoltan
Beginner
1,407 Views

Hello there,

we have experienced a similar crash with a similar call stack in a QT app using TBB on Mac.

If the proxy dylib is there, the app "might" crash during startup. If we delete the proxy dylib, the app is stable. We used TBB 2018 Update 5 (the released binaries from the website).

Maybe you can fix this in a future version.

0 Kudos
Reply