Running Inspector with tcmalloc

Marquess__Paul · ‎04-30-2020

We have a multi-threaded server-based application that uses tcmalloc under the hood. Currently have an issue where our servers are coring under very heavy load. Suspect buffer overruns are the culprit.

Is there any way we can run Inspector and tcmalloc at the same time and get useful data. I'm assuming that Inspector wants to use it's own version of malloc so what I need isn't going to work.

Anyone tried this or had experience of running Inspector with a non-standard malloc?

Michael_T · ‎04-30-2020

Inspector uses dynamic binary instrumentation and doesn't require special program re-compilation. I'm not familiar with tcmalloc library, but if it provides its own implementation of malloc()/free() functions with the same semantic, it will work 'as is' with Inspector. When such application is running under Inspector it should detect those functions by symbol name and insert appropriate instrumentation there to intercept arguments and return value.

If API is different or has some unique behavior, then library or caller could insert special annotations that could be recognized by Inspector in runtime. Here are details and some examples: https://software.intel.com/en-us/inspector-user-guide-linux-apis-for-custom-memory-allocation

Marquess__Paul · ‎04-30-2020

Thanks for getting back so promptly Michael.

From an application perspective we just use vanilla malloc/free with C & new/destroy with C++. We don't make use of any of the low-level features of tcmalloc. Under the hood tcmalloc (like a lot of the more modern malloc implementations) handles the low level memory allocations from the OS (via sbrk/mmap) by asking for large chunks of memory which it then manages. Each application call to malloc/new will then return a pointer to an offset inside one of those larger chunks of tcmalloc-owned memory.

I assume then, that if Inspector can cope with that type of model it is intercepting the malloc/new before the the "real" allocator get control & remembers the addresses & lengths of every allocation.

The part I'm unclear of is whether Inspector can then detect the classic problem of user code walking off the end of its allocation and overwriting an area of memory that malloc has allocated to something else.

One final question - are there stats available for the cost of running Inspector, especially under very heavy load. I'm thinking of both the cpu & memory cost for the extra bookkeeping Inspector does. I've had a look around the documentation but haven't spotted anything yet.

Marquess__Paul · ‎05-01-2020

Found the answer to my second question about the performance hit on the Inspector XE page here. So have more of an idea on the cost of running the tool.

Looking on that same page I see a reference to Pointer Checker, which is billed as a compiler-based tool to check for out-of-bounds memory accesses. That looks exactly what we need, but requires th eIntel C/C++ compile, so that is not an option in the short-term.

Also, in Inspector User Guide for Linux I doesn't mention out-of-bounds checks in the Problem Type Reference section.

Does that mean Inspector on it's own cannot do out-of-bounds checking?

Michael_T · ‎05-04-2020

Hi Paul,

Great that you found that useful article! :) I would also add that analysis performance highly depends on application itself. For example, if application frequently allocates and deallocates tiny objects, it will cause worse performance of Inspector than application that rarely allocates memory blocks. Similarly, applications that frequently read/write to non-aligned memory locations will trigger more synchronizations in Inspector. Inspector has to keep its own shadow memory of valid/initialized bytes to detect incorrect accesses in application and it has to be consistent for entire process.

There many tools for memory correctness verification. Each of them has some pros and cons. Talking about compiler or C runtime checks, they would be very useful especially for data allocated on stack. Since this data layout is very compiler dependent and have a lot of nuances on different platforms, compilers will obviously have more control there.

Out-of-bounds errors are called ‘Invalid Memory Access’ in Inspector. When application tries to read or write something to invalid region, it should report it.

Marquess__Paul · ‎05-04-2020

Thanks Michael.

When I run Inspector with our application I get the error "Error: Internal error. Please contact Intel customer support team." and Inspector crashed. I only have the "Detect invalid memory accesses" option selected in "Detect Memory Problems".

Note - we have the free version of Inspector.

Michael_T · ‎05-05-2020

That sounds bad. Could you please compress and share your result folder after the crash (r001mi*)? Log files there might give us some clue what could be wrong. If you have small reproducer, it will also be useful.

Marquess__Paul · ‎05-05-2020

Data enclosed

Michael_T · ‎05-05-2020

Unfortunately, there is a crash in Inspector’s analysis code. We were looked at similar issue previously, but there was a problem to reproduce it on our side. I suspect there is some sort of data race that triggers invalid memory access in the tool when number of active threads is big enough. Maybe if you decrease number of threads, it will not crash, but I doubt it is reasonable solution.

If you are allowed to share your application binaries (sources are not needed) please contact me by e-mail (michael.tutin@intel.com), we can try to reproduce it here locally and investigate the crash. Otherwise, the only option is to wait when similar issue is triggered somewhere in our tests.

Michael_T · ‎05-06-2020

Meanwhile I looked at diagnostics that were reported just before the crash and noticed incorrect call to mmap() function there. If mmap() is called with MAP_FIXED flag, it allocates new memory page at given address overlapping any existing allocations if there are any. Usually this flag is used to map memory region on pre-reserved space, but Inspector didn’t capture if this region was reserved. Maybe there is incorrect (or stale) address passed. Potentially this call can corrupt any data structures including Inspector’s ones if mmap’ed region overlaps it. Could you please check if this memory mapping works as expected?

Marquess__Paul · ‎05-06-2020

Thanks Michael,

yes, the mmap is working as expected. Our application uses a shared memory region for IPC.

Still check to see if I can share binaries.

thanks

Paul

Running Inspector with tcmalloc

Intel® Inspector