Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

tbbmalloc proxy failing for Windows 10 x64 Insider Preview build 14388

Tucker__Mark
Beginner
789 Views

I'm running the latest Windows 10 x64 Insider Preview build (14388). It looks like tbbmalloc is having trouble finding the original function pointers for _msize and _aligned_msize in ucrtbase.dll because the known_bytecodes don't match. This was resulting in the _putenv function breaking (because it relies on _msize for a block of memory allocated before hooking).

I was able to get things working again by adding the following entries to the known_bytecodes table to WIN64:

"E90B000000**********************4883EC284885C90F",       //release msize() 10.0.14388.0 win64
"48895C2408574883EC20",   //release _aligned_msize() win64

Is this an issue that you are aware of? Were you planning on waiting until the next release ships before updating those byte codes (since there's no way of knowing if they might change again)?

0 Kudos
9 Replies
Alexey-Kukanov
Employee
789 Views

Hello Mark,

the problem should have been fixed in Intel TBB 4.4 Update 4 released this spring. If you use an earlier version, please update.

0 Kudos
Tucker__Mark
Beginner
789 Views

You are right that I was not up to date. I updated to 4.4U5, and I see a few new sets of opcodes, and an change to the InsertTrampoline functions. Unfortunately with the current Windows Insider Preview build (up to 14393 now, released just a few days ago) my app exits as soon as I start it with an assertion about a failure in CheckOpcodes.

First, for these latest builds there is still a missing opcode for free() ("C7442410000000008B442410E9"). Also in the version of ucrtbase.dll that ships with this Windows build, _msize starts with a JMP instruction, and _aligned_msize does not. The TBB hooking code is written to expect _aligned_msize to start with a JMP and _msize to not. Wouldn't it make more sense for the JMP detection to always run before checking the opcodes, rather than requiring a special opcodes value to invoke it (i.e. why insist that TBB know ahead of time which function implementations start with a JMP and which don't)?

0 Kudos
Alexey-Kukanov
Employee
789 Views

Thanks for the experiments and useful information. 

Since Windows Insider builds are a moving target, adding new opcodes for it now seems pretty useless. I hope you are OK to patch TBB locally and rebuild for now. Once a public release is out, we will see what needs to be changed.

You are right that support for jumps could be more generic. Old versions of CRT did not have jumps in the entry points of interest, and we expected ucrtbase.dll to have stable ABI and change rarely, because it should be backward compatible. So we did what seemed simplest at that time. But the information you provided certainly tells that this implementation needs to be reconsidered; we will do it in a later update.

0 Kudos
Francois_P_Intel
Employee
789 Views

I see the same error on Build 1607, the final build of Windows Anniversary.

Francois Piednoel, Intel.

0 Kudos
Tucker__Mark
Beginner
789 Views

It seems like my patches to the older version of TBB I'm using are holding up with the release build of Windows, so from my perspective there is no rush on fixing this. It would be great if you were able to come up with a more robust approach to hooking these functions. But in case it helps anyone else, here is the complete list of additions to the opcodes table that I'm using:

//release msize() ucrtbase.dll 10.0.14388.0 win64
"E90B000000**********************4883EC284885C90F",
//release _aligned_msize() ucrtbase.dll 10.0.14388.0 win64
"48895C2408574883EC20",
//release free() ucrtbase.dll 10.0.14388.0 win64
"C7442410000000008B442410E9",

For anyone using 4.4 Update 5, you'll also need to stop using the special jump instruction test used for hooking _aligned_msize, but other than that I think these additions should still work. I also don't have updated values for 32 bit Windows since I only needed the 64 bit values.

Mark

0 Kudos
Alexey-Kukanov
Employee
789 Views

Let me update on the status of the issue:

  • In TBB 2017, the failing assertion was removed; instead, when tbbmalloc_proxy detects a replacement problem it will not redirect memory routines to the TBB allocator, so that an application might get slower but will keep working.
  • In TBB 2017 Update 1, we fixed the replacement to work correctly for Windows 10 Anniversary Update, including proper handling of jumps.

I hope this not only fixes the issue but also prevents similar ones in the future.

0 Kudos
Tucker__Mark
Beginner
789 Views

Yes, thank you! I meant to come back and update this post as well. Just a couple of days ago I updated our proxying code based on the code in TBB 2017. On the downside, the reason I was forced to do this was that a new Windows Preview build was just released, and once again proxying is broken for that new build. But at least now everything still runs (just more slowly, without tbbmalloc). Once the Preview build makes it into general release I can do the work to figure out what new byte codes need to be added. But it turns a crisis into a simple "todo" item. So a big thanks for that!

I saw a comment in the code about using "hot patch prologues" the bytecodes table, which I assume means using some external file to load in additional byte codes so future changes to that table won't require a recompile... Is that the idea?

0 Kudos
Alexey-Kukanov
Employee
789 Views

Hi Mark,

no, that's different. When working on the patch, I have found out that the function prologues in ucrtbase are written in such a way that there is a space at the beginning of the functions reserved with noops, exactly to insert a jump. Using these "hot patch prologues" we can make our replacement code more universal and also more robust for future updates of ucrtbase: if we see a pattern that matches a known hot patch prologue, we insert a jump to tbbmalloc routine right there, and use the address after the prologue as the original function address for fallback handling.

Your idea of using an external file for opcode patterns is intriguing at glance, but seems somewhat insecure, to be honest. It would add a place of direct end-user input which would be used essentially for hacking key DLLs in that process. I'm afraid it would open a can of worms. So for now I prefer to stick to the conservative approach of only replacing known bytecodes, even though it might not work out of the box for new versions of MS binaries.

0 Kudos
Tucker__Mark
Beginner
789 Views

Okay, thanks for the clarification. I also had the thought that the external file for bytecodes might be somehow "unsafe", but I couldn't think of how it could be used maliciously (without also altering the source code that uses those bytecodes). But I'm far from an expert on malicious software.

0 Kudos
Reply