First, thanks for the new release, looking forward to using some of the new functionality.
I downloaded the precompiled release for windows, and all works much as expected apart from my MSVC 2015 compiled app now has an dependency on TbbMalloc.dll, I've verified this with depends.exe.
I've traced this back to my use of:
If you reduce the example for covariance_batch that to just that line and run depends on the exe you can see the dependency. The Naïve_Bayes_dense_batch example also shows this.
Unfortunately I currently have to run as sequential without tbb (linking with daal_core.lib;daal_sequential.lib) as I suspect tbb might clash with our threading library.
Is there anyway I can avoid the dependency, or is it intended?
It appears to happen when you use any code that uses:
Seems to fit the pattern ok at the moment, currently building the source so let me know if you want me to try anything.
Yes, TBB memory allocator is used in both serial and parallel versions of the library. We will analyze how to address this dependence for serial case in the future releases. Does this dependence prevent you from use of the library in your environment?
It doesn't necessarily stop us. It's unfortunate as it means deploying that dll to all developers and updating our install scripts, but that's just work.
I'm technically more concerned about dragging another memory manager into our process, we do tend to manage memory fairly aggressively as we have a few algorithms that take up huge amounts of ram. Having a separate memory manager in there might end up with contention between the two for a limited resource that might not end well for either of them.
Do you have any details about what tbbmalloc actually does? I'm guessing fast concurrent heap management, but does it do anything like pre-allocating large heaps? We run x64 so its not too important if we lose a bit of address space but we would be in trouble if say it locked in large chunks of ram.
The TBB memory allocator may request up-to 4M buffers via VirtualAlloc and use these buffers to allocate and re-use chunks of this memory.
BTW you can try to use TBB memory allocator in your code and switch to it and keep one memory manager in case you find it usefull:)
Intel TBB developer
Ha, yeah didn't consider using it for everything else - might be an option :)
Looks like I have the go ahead from here to include it so I'll just go with it.
Thanks for your help
To add to Vladimir's reply. To maximize data locality and get better performance we allocate the buffers to store intermediate computation results inside of the algorithms. Typically, the buffer size depends on the algorithm and/or problem dimensions and can be either fixed or, say, the multiple of number of features in the input dataset. Few algorithms, however, such as LogitBoost may require internal buffers of the size multiple to number of feature vectors.
Please let us know, if it helps address your questions.
Yeah, although, given the tbb prefix, I guessing that it's optimal when going multi-threaded, and we're not currently (although we do call separate daal functions concurrently, notably covariance.)
We currently use the ms concurrency runtime, I don't suppose you have any experience of anyone mixing tbb and Concrt? And yes I know we could make everything tbb, but I don't think I'd get away with that ;)
Since you use sequential DAAL you don't mix concrt and tbb. TBB allocator uses TBB atomics or might use some containers but does not use the TBB scheduler so there is no TBB thread pool.
BTW to try to play with sticking to one (TBB) allocator you can try automatic memory manager replacement: https://software.intel.com/en-us/node/506098.
Understood, tbbmalloc is separate to tbb. I was really asking if you knew of any potential problems with mixing tbb (if we used multi-threaded DAAL) and MS Concurrency Runtime. I understand completely if you have no experience of this, as why would you!
ah, I see threaded DAAL with your app threaded by concrt. There should not be correctness problem but there might be perfromance impact in case you run threaded daal from concert thread pool execution.