As for amount of used system memory, usually you do not need allocator support. You can read VmSize value from proc/status. Consider the example: https://github.com/intel/tbb/blob/tbb_2019/src/test/harness_memory.h#L57-L97
As for amount of user allocated memory, it is interesting problem. It might be not so easy to calculate it in concurrent environment without additional overhead when threads are dynamically created and destroyed.
Reading the VmSize is something we're already doing but it's not accurate in terms of how much system memory that tbbmalloc is using, right?
I appreciate the difficulty in the presence of parallel allocation and deallocations. Perhaps this page is a better analogy which discusses this type of functionality in jemalloc: https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Basic-Allocator-Statistics ;