Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

memory not free when using tbb::concurrent_hash_map

wei6rong
Beginner
1,576 Views

hi everyone!

I have run into a problem with concurrent_hash_map, We are trying to use tbb::concurrent_hash_map to store key-value resource, we have a pointer in the value of sturct which point to memory dynamic allocated,  late we would traveled through it and delete timeout object.

my test code is 

#include "tbb/concurrent_hash_map.h"

typedef struct Publish_Temp_traffic_Info {
	float St_Deviation;
	float weight;
	time_t timestamp;
	char* traffic_info;
	Publish_Temp_traffic_Info() {
		timestamp = 0;
		weight = 0.0;
		traffic_info = NULL;
		St_Deviation = 0.0;
	}
} ST_PUBLISH_MEM_TRFINFO;

typedef tbb::concurrent_hash_map<int, ST_PUBLISH_MEM_TRFINFO> THASH_PUBLISH_MEM_INFO;

int indecount = 1000000;
int main() {
	THASH_PUBLISH_MEM_INFO testhash;
	ST_PUBLISH_MEM_TRFINFO info;

	for(int loop =0; loop < indecount;loop++) {
		THASH_PUBLICSH_MEM_INFO::accessor aAccessor;
		info.traffic_info = new char[1024];

		if (testhash.insert(aAccessor,loop)) {
			aAccessor->second = info;
		}
		aAccessor.release();
	}
	sleep(10);
	for(int loop =0; loop < indecount;loop++) {
		THASH_PUBLICSH_MEM_INFO::accessor aAccessor1;

		bool iRet = testhash.find(aAccessor1, loop);
		if (!iRet)
			continue;
		ST_PUBLISH_MEM_TRFINFO *pInfo = &(aAccessor1->second);
		if(NULL != pInfo->traffic_info){
			delete [] (pInfo->traffic_info);
			pInfo->traffic_info = NULL;
		}
		testhash.erase( aAccessor1 );
	}
	sleep(1);
}

when debug and stop at the two sleep line,  i using command of "free -m" to see memory, we found memory not return

to system, what's the problem?

anybody can help?

and i boil down the code as following

int indexcount = 1000000;
int main() {
    THASH_PUBLISH_MEM_INFO testhash;
    ST_PUBLISH_MEM_TRFINFO info;

    char * arr[1000000];
    std::pair<int, ST_PUBLISH_MEM_TRFINFO> kay_value;
    kay_value.second = info;

    THASH_PUBLISH_MEM_INFO::accessor acc;
    for(int loop =0; loop < indexcount;loop++) {
        arr[loop] = new char[1024];
        {
            kay_value.first = loop;
            if (testhash.insert(acc,kay_value))
            {
//            if (testhash.insert(acc,loop)) {
//            acc->second = info;
            }
            acc.release();
        }

//        delete [] (info.traffic_info);
//        info.traffic_info = NULL;
    }
    sleep(1);

//    testhash.clear();

    for(int loop =0; loop < indexcount;loop++){
        delete [] (arr[loop]);
    }

    sleep(1);

    reutrn 1;

}

 

and we comment the line code of   if (testhash.insert(acc,kay_value)) and stop at second sleep(1); using "free -m" command, we see

 

the memory has return to system,

so  am i using concurrent_hash_map correctly?

any help would be appreciated.

 

 

0 Kudos
20 Replies
RafSchietekat
Valued Contributor III
1,539 Views

Please note that, except for special circumstances, it's probably better to scope an accessor variable to let the destructor do the releasing for you, so you generally shouldn't be calling release() yourself.

In the first piece of code, "info.traffic_info = new char[1024];" is leaked if the insert returns false. That may not be happening here, but it's problematic to have such code around at all: at the very least it's a distraction for anybody trying to find the cause of a real problem.

Perhaps what you're seeing is the difference between some TBB dynamic allocations and none at all (unless you redirect new/delete to TBB's scalable allocator, these allocations are from a separate allocation mechanism). The scalable allocator gets memory from the system in big chunks, and then administers it internally for program-level allocations. Probably TBB keeps at least the last chunk of memory around even if all allocations from it are released, because most likely it will be needed again soon or the program will end anyway.

I would suggest to try this again with a really big indexcount. Then you may see that some of those big chunks do get released to the system again.

(Added) Also note that one long-lived allocation can keep a whole chunk alive, so without garbage collection an element of luck is involved, and there is probably a trade-off between chunk overhead and likelihood of getting pinned down by a lonely allocation. But TBB does make an attempt to return memory that is no longer needed.

0 Kudos
wei6rong
Beginner
1,539 Views

hi Raf,

thank you for your comment.

the previous post code is just for replay the issue i meet. i have change it as simple as following

#include "tbb/concurrent_hash_map.h"
#include "tbb/scalable_allocator.h"

class A {
public:
	A(){
//		std::cout << "A construct "<< endl;
	}
	~A(){
//		std::cout << "~A distruct "<< endl;
	}
	char aa[6144];
};

typedef struct Publish_Temp_traffic_Info {
	float St_Deviation;
	float weight;
	time_t timestamp;
	A* traffic_info;
	Publish_Temp_traffic_Info() {
		timestamp = 0;
		weight = 0.0;
		traffic_info = NULL;
		St_Deviation = 0.0;
	}
} ST_PUBLISH_MEM_TRFINFO;

typedef tbb::concurrent_hash_map<int, ST_PUBLISH_MEM_TRFINFO> THASH_PUBLISH_MEM_INFO;

int indexcount = 1000000;
int main() {
	THASH_PUBLISH_MEM_INFO testhash;
	ST_PUBLISH_MEM_TRFINFO info;

	char * arr[1000000];
	std::pair<int, ST_PUBLISH_MEM_TRFINFO> kay_value;
	kay_value.second = info;

	for(int loop =0; loop < indexcount;loop++) {
		assert(NULL !=(arr[loop] = new char[1024]));
	}

	for(int loop =0; loop < 1;loop++) {
		{
			kay_value.first = loop;
			THASH_PUBLISH_MEM_INFO::accessor acc;
			if (testhash.insert(acc,kay_value))
			{
//			    if (testhash.insert(acc,loop)) {
//			    acc->second = info;
			}
//			acc.release();
		}
	}

//	testhash.clear();

	sleep(0.1);
	for(int loop =0; loop < indexcount;loop++){
		delete [] (arr[loop]);
	}

	sleep(0.2);
	for(int loop =0; loop < indexcount;loop++) {
		assert(NULL !=(arr[loop] = new char[1024]));
	}

	sleep(0.3);
	for(int loop =0; loop < indexcount;loop++){
		delete [] (arr[loop]);
	}

  	sleep(0.4);

  	return 1;
}

if we comment the line of

//			    if (testhash.insert(acc,kay_value)) {

stop after delete [] and using free -m command, we can see the memory return to system,

but if uncomment the line, after first memory delete, the memory is not return to system,

and later we new the memory again, we can see the memory didn't increase, it must reuse the first memory we acquired,

so,  it seems that  tbb change the behavior of the memory new/delete.

for the stack limits we not test  large indexcount  more than 1000000, but I have do the test of change

new char[1024])) to new A and the all is same.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,539 Views

Excuse me, but in your sample code above, when removing the // comments, then the {}'s do not match.

Raf, I am not a user of concurrent hash map, can you comment on my code suggestion below?

#if 0
// original code
THASH_PUBLISH_MEM_INFO::accessor acc;
if (testhash.insert(acc,kay_value))
{
  if (testhash.insert(acc,loop)) {
  acc->second = info; // ?? where is close }
}
acc.release();
#else
// Raf, is this correct?
THASH_PUBLISH_MEM_INFO::accessor acc;
if (testhash.insert(acc,kay_value))
{
  ... (optional do something via acc)
  acc.release(); // done with pair
  // now insert another
  if (testhash.insert(acc,loop)) {
    // do something with pair
    acc->second = info; 
    acc.release(); // done with pair
  }
}
#endif

In the original code, assuming there were a typo and the matching } were correctly present in the code, then the original code had a flaw where the accessor to the first insert, not going out of scope, but being re-used on the second insert. What I do not know, is if the reuse of an accessor that currently holds a lock, implicitly unlocks the lock. If the answer is no, then in the original code, the first pair, which is locked, remains locked, and thus the hash table cannot be returned.

Jim Dempsey

0 Kudos
RafSchietekat
Valued Contributor III
1,539 Views

Regarding #3, to be clear, my understanding was that "see the memory return[ed] to system" (or its equivalent before) is/was an interpretation of seeing a lower memory use than without commenting out the insert() instruction across build&test uses, and not actually first seeing significant memory consumption and then seeing it decrease during one build&test. Hence my conclusion that, without the insert(), TBB doesn't actually allocate anything. Was this correct?

But then I don't see what "new the memory again" means, because I don't see such an outer loop in the code example.

I also don't know what "stack limits" have to do with not trying a higher element count.

Regarding #4, it's safe to reuse an accessor (with properly balanced operations), just like it's safe to reuse a scoped_lock (with properly balanced acquire() and release()).

I don't see any accessor being reused without a release() (sometimes there's even a redundant release() before the end of the accessor's lifetime, which isn't a problem)? I don't know off the top of my head what would happen without an intervening release() (mishap, exception, or an attempt to do something sensible), but it doesn't seem to matter here (unless I overlooked something?).

0 Kudos
wei6rong
Beginner
1,539 Views

hi Raf,

sorry for my not clearly explain, i would describe my test steps as following

code is (please be noted we only doing hash insert once only)

int indexcount = 1000000;
int main() {
	THASH_PUBLISH_MEM_INFO testhash;
	ST_PUBLISH_MEM_TRFINFO info;

	char * arr[1000000];
	std::pair<int, ST_PUBLISH_MEM_TRFINFO> kay_value;
	kay_value.second = info;

	for(int loop =0; loop < indexcount;loop++) {
		assert(NULL !=(arr[loop] = new char[1024]));
	}

	for(int loop=0;loop<1;loop++) {
		{
			kay_value.first = loop;
			THASH_PUBLISH_MEM_INFO::accessor acc;
			if (testhash.insert(acc,kay_value))
			{
//              do nothing
			}
		}
	}

	sleep(0.1);
	for(int loop =0; loop < indexcount;loop++){
		delete [] (arr[loop]);
	}

	sleep(0.2);
	for(int loop =0; loop < indexcount;loop++) {
		assert(NULL !=(arr[loop] = new char[1024]));
	}

	sleep(0.3);
	for(int loop =0; loop < indexcount;loop++){
		delete [] (arr[loop]);
	}

  	sleep(0.4);

  	return 1;
}

 

if we comment the line of

if (testhash.insert(acc,kay_value))

debug stop at each sleep line and the free -m command output is

[root@devserver ~]# free -m
                       total       used       free     shared    buffers     cached
Mem:         15831       5907       9924          0        488        545
-/+ buffers/cache:       4873      10958
Swap:         9935       4004       5931
[root@devserver ~]#
[root@devserver ~]# free -m
                       total       used       free     shared    buffers     cached
Mem:         15831       4914      10917          0        488        545
-/+ buffers/cache:       3880      11951
Swap:         9935       4004       5931
[root@devserver ~]#
[root@devserver ~]# free -m
                       total       used       free     shared    buffers     cached
Mem:         15831       5908       9923          0        488        545
-/+ buffers/cache:       4874      10957
Swap:         9935       4004       5931
[root@devserver ~]# free -m
                       total       used       free     shared    buffers     cached
Mem:         15831       4914      10917          0        488        545
-/+ buffers/cache:       3880      11951
Swap:         9935       4004       5931

we can see the free memory change from 9924->10917->9923->10917 and behavior as we expected.

but if uncomment the line of

if (testhash.insert(acc,kay_value))

also debug stop at each sleep line and the free -m command output is

[root@devserver ~]# free -m
                       total       used       free     shared    buffers     cached
Mem:         15831       5906       9925          0        488        545
-/+ buffers/cache:       4872      10959
Swap:         9935       4004       5931
[root@devserver ~]# free -m
                       total       used       free     shared    buffers     cached
Mem:         15831       5907       9924          0        488        545
-/+ buffers/cache:       4873      10958
Swap:         9935       4004       5931
[root@devserver ~]# free -m
                      total       used       free     shared    buffers     cached
Mem:         15831       5911       9920          0        488        549
-/+ buffers/cache:       4873      10958
Swap:         9935       4004       5931
[root@devserver ~]# free -m
                       total       used       free     shared    buffers     cached
Mem:         15831       5910       9921          0        488        549
-/+ buffers/cache:       4872      10959
Swap:         9935       4004       5931


free memory change from 9925->9924->9920->9921, seems after first delete [] operator, the memory not return to system,

and after second memory acquired operator ( new char[1024]));), memory not increase, it must reused the memory which should return to system, so my problem is why just one times testhash.insert(acc,kay_value) operator(loop once), change the behavior of the memory new []/delete []?

regard of large indecount test, for we declared char * arr[1000000] as locale variable, if we increase indecount, we must new it in heap, so we using class A as replacement as I described in

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,539 Views

>>I don't see any accessor being reused without a release()

// original code (more comments
THASH_PUBLISH_MEM_INFO::accessor acc;
if (testhash.insert(acc,kay_value))
{
  // acc holds lock on kay_value here
  // acc lock on kay_value not released before....
  if (testhash.insert(acc,loop)) {
    // does program hold two locks here?
    // one on kay_value and one on "loop"
    acc->second = info; // ?? where is close }
}
// releasing second acquired lock, not first
// (assuming 2nd insert above did not implicitly unlock first)
acc.release();

Jim Dempsey

0 Kudos
RafSchietekat
Valued Contributor III
1,539 Views

Regarding #7, I seem to have skipped that code in #4. My comments were only about the comments in #4 about preceding code. Sorry for that.

The "original" code in #4 however is not a reproduction of any code before it, as seemed to be claimed ("had a flaw"), because in the preceding code the second insert() was commented out, and I was assuming that the commented-out insert() did not play a role. Viewed by itself the "original" code in #4 would probably have undefined behaviour: there is a debug-only __TBB_ASSERT() in TBB's implementation asserting that the accessor is not currently active. So within #4 you would need the second code. But this does not seem relevant to the original problem.

I have to say I'm getting quite confused by all the assumptions I am having to make and trying to get right...

#6 is now very clear and specific, thanks for that.

I can only make a guess at this time. An experiment would be to replace the insert() with another "new char[1024];" that is not released. If this gives the same outcome, then maybe there's a plain new/malloc allocation somewhere behind the insert() that does the same thing, and these small allocations have the traditional brk implementation? I really don't know.

But there's also the question how much this matters. If RAM becomes scarce, the memory might be swapped out if it is not abandoned, so strictly speaking it might indeed result in longer execution. But if the program used that much memory in the first place, it is likely to use it again later on, and #6 already verifies that the memory is still available. How likely is it that the memory has to be swapped out before it gets reused, I wonder?

If somebody has a clear insight here, I would be interested to hear it, but otherwise right now I don't see that this is something urgent to worry about.

0 Kudos
wei6rong
Beginner
1,539 Views

but , for programs run in backend, memory new/delete is very  frequently,  sometimes we acquire large memory, and it not release to system quickly, out system monitor would be triggered(for memory be kept in high watermark), we expected that the memory should behavior normally(after delete [] operator,  our memory should decrease to normal usage level).

so, hope somebody can help me with this.

0 Kudos
RafSchietekat
Valued Contributor III
1,539 Views

I wish I could just "help", but it's only best effort, I'm afraid...

How about using, e.g., mallinfo() to keep track of memory use, instead of "free -m"? You would have to explicitly apply it inside the program, and you would have to do it often enough to detect runaway memory use and its relief, but "free -m" apparently also has its limitations.

At this point you should probably first do the experiment I suggested earlier.

(Added) And you could also try to scope the concurrent_hash_map to see what happens after it is destroyed. If the segment allocations are delegated to malloc(), that might also make a noticeable difference. If destroying the concurrent_hash_map is the solution, then most likely "free -m" is the problem, so to say.

0 Kudos
Anton_M_Intel
Employee
1,539 Views

tbb::concurrent_hash_map allocates some memory using cache_aligned_allocator (and it cannot be changed by its template arguments) which in turn uses TBB allocator if available, thus I think it is trivial caching of the memory in tbbmalloc. Moreover, after insert(), there is no 'clear()' which would release the container's memory back to allocator, though it can help only in case tbbmalloc is not loaded (not available in the same directory as tbb library) since, again, tbbmalloc caches the memory.

0 Kudos
RafSchietekat
Valued Contributor III
1,539 Views

That was my original hypothesis (tbbmalloc caching memory), but it's not supported by #6, so I've abandoned that line of reasoning.

0 Kudos
wei6rong
Beginner
1,539 Views

hi all,

about #6, I did some interesting experiments,  maybe there are  some clues can help your find the results.

I using mallinfo() to keep track of memory use, following is my test code. 

#include <malloc.h>
#include <stdio.h>
void getMemStatus()
{
	struct mallinfo info = mallinfo ();
	printf("arena = %d\n", info.arena);
	printf("ordblks = %d\n", info.ordblks);
//	printf("smblks = %d\n", info.smblks);
//	printf("hblks = %d\n", info.hblks);
//	printf("hblkhd = %d\n", info.hblkhd);
//	printf("usmblks = %d\n", info.usmblks);
//	printf("fsmblks = %d\n", info.fsmblks);
	printf("uordblks = %d\n", info.uordblks);
	printf("fordblks = %d\n", info.fordblks);
	printf("keepcost = %d\n", info.keepcost);
	printf("==========================\n", info.keepcost);
}

int indexcount = 1000000;
int main() {
	THASH_PUBLISH_MEM_INFO testhash;
	ST_PUBLISH_MEM_TRFINFO info;

	char * arr[1000000];
	std::pair<int, ST_PUBLISH_MEM_TRFINFO> kay_value;
	kay_value.second = info;

	for(int loop =0; loop < indexcount;loop++) {
		assert(NULL !=(arr[loop] = new char[1024]));
	}

//	char * aa = new char[1024]; // just for testing only
	for(int loop=0;loop<1;loop++) {
		{
			kay_value.first = loop;
			THASH_PUBLISH_MEM_INFO::accessor acc;
			if (testhash.insert(acc,kay_value))
			{
//              do nothing
			}
		}
	}

getMemStatus();
	sleep(0.1);
	for(int loop =0; loop < indexcount;loop++){
		delete [] (arr[loop]);
	}

getMemStatus();
	sleep(0.2);
	for(int loop =0; loop < indexcount;loop++) {
		assert(NULL !=(arr[loop] = new char[1024]));
	}

getMemStatus();
	sleep(0.3);
	for(int loop =0; loop < indexcount;loop++){
		delete [] (arr[loop]);
	}

getMemStatus();
  	sleep(0.4);

//  	while(1){
//  		sleep(0.5);
//  	}
  	return 1;
}

the output of getMemStatus() is the same to free -m command

arena = 1040117760
ordblks = 2
uordblks = 1040004432
fordblks = 113328
keepcost = 113200
==========================
arena = 1040117760
ordblks = 3
uordblks = 4432
fordblks = 1040113328
keepcost = 113200
==========================
arena = 1040117760
ordblks = 2
uordblks = 1040004432
fordblks = 113328
keepcost = 113200
==========================
arena = 1040117760
ordblks = 3
uordblks = 4432
fordblks = 1040113328
keepcost = 113200
==========================

uordblks(This is the total size of memory occupied by chunks handed out by malloc)

+ fordblks(This is the total size of memory occupied by free (not in use) chunks)

=arena(This is the total size of memory allocated with sbrk by malloc, in bytes)

we can see after delete operator, the memory is returned to heap(fordblks=113328 increased to fordblks = 1040113328) but didn't release to system, so free -m also see memory occupied by the process,

the interesting thing is if we replace the

	for(int loop=0;loop<1;loop++) {
		{
			kay_value.first = loop;
			THASH_PUBLISH_MEM_INFO::accessor acc;
			if (testhash.insert(acc,kay_value))
			{
//              do nothing
			}
		}
	}

to

char * aa = new char[1024]; // just for testing only

the output is the same, so i think it's none of TBB business, 

and if we revert the order of the two for loop to following

//	char * aa = new char[1024]; // just for testing only
	for(int loop=0;loop<1;loop++) {
		{
			kay_value.first = loop;
			THASH_PUBLISH_MEM_INFO::accessor acc;
			if (testhash.insert(acc,kay_value))
			{
//              do nothing
			}
		}
	}

	for(int loop =0; loop < indexcount;loop++) {
		assert(NULL !=(arr[loop] = new char[1024]));
	}

that is new [] operator immediately following by delete [] operator, the output is

arena = 1040117760
ordblks = 2
uordblks = 1040004432
fordblks = 113328
keepcost = 113200
==========================
arena = 139264
ordblks = 2
uordblks = 4432
fordblks = 134832
keepcost = 134704
==========================
arena = 1040121856
ordblks = 2
uordblks = 1040004432
fordblks = 117424
keepcost = 117296
==========================
arena = 139264
ordblks = 2
uordblks = 4432
fordblks = 134832
keepcost = 134704
==========================

we can see heap memory is released to system,

I found a discussion at

http://stackoverflow.com/questions/12178961/if-when-does-the-does-deallocated-heap-memory-get-reclaimed

i  also paste it as following

Usually there are 2 ways to allocate memory: if you malloc()/new a memory block above a certain size, the memory gets allocated from the OS via mmap() and eturned as soon as it is free. Smaller blocks are allocated by increasing the process's data area by shifting the sbrk border upwards. This memory is only freed if a block over a certain size is free at the end of that segment.

E.g.: (pseudo code, I don't know C++ very well)

a = new char[1000];
b = new char[1000];

Memory map:

---------------+---+---+
end of program | a | b |
---------------+---+---+

If you free a now, you have a hole in the middle. It is not freed because it cannot be freed. If you free b, the process's memory may or may not be reduced; the unused remainder is returned to the system.

A test with a program as simple as

#include <stdlib.h>

int main()
{
    char * a = malloc(100000);
    char * b = malloc(100000);
    char * c = malloc(100000);
    free(c);
    free(b);
    free(a);
}

leads to a strace output like

brk(0)                                  = 0x804b000
brk(0x8084000)                          = 0x8084000
brk(0x80b5000)                          = 0x80b5000
brk(0x809c000)                          = 0x809c000
brk(0x8084000)                          = 0x8084000
brk(0x806c000)                          = 0x806c000

is shows that the brk value is first increased (for malloc()) and then decreased again (for free()).

it may explain our care about (why the deleted memory not release to system and just mark as free in heap)

so, did all of you have any ideas?

 

0 Kudos
RafSchietekat
Valued Contributor III
1,539 Views

You could still try to see what happens when the last allocation goes away: first the "char * aa = new char[1024]; // just for testing only", then the insert() (add a block to limit the lifetime of the map). You could try to trace malloc() to see how exactly TBB's scalable allocator delegates to malloc() for bigger allocations (I'm not quite sure, because I think there have been changes since I last looked at the code, but it's the only explanation that makes sense right now). However, that's all just for curiosity.

An alternative experiment is to redirect new/delete to TBB's scalable allocator. All the kilobyte-sized allocations would come from mmap chunks (not the delegation to malloc that would happen only for bigger allocations), and there wouldn't be a hole anymore. However, now you may have some big mmap chunks holding isolated allocations, so it's not a guaranteed solution, and there may also be some allocation overhead because the scalable allocator buys performance by using a limited number of bins with predetermined allocation sizes (it would be interesting to be able to add more bins).

But my understanding is that this is a problem of monitoring, not overhead by having dirty free memory getting swapped out? Then I would suggest looking for alternatives to "free -m" that find out what mallinfo() obviously knows. If the program already communicates with the outside world, perhaps this could be as simple as adding a call to an operation that consults mallinfo().

(2014-05-08 Clarification) With "trace malloc", I meant setting a breakpoint at insert(), and then when reaching it setting a breakpoint at malloc().

0 Kudos
wei6rong
Beginner
1,539 Views

hi Raf,

thanks for reply.

if we just new/delete memory, we can see it release to system directly by free -m command,

I changed my test code as following

#include <malloc.h>
#include <stdio.h>
#include "tbb/concurrent_hash_map.h"
#include "tbb/scalable_allocator.h"

class A {
public:
	A(){
//		std::cout << "A construct "<< endl;
	}
	~A(){
//		std::cout << "~A distruct "<< endl;
	}
	char aa[2048];
};

typedef struct Publish_Temp_traffic_Info {
	int St_Deviation;
	float weight;
	time_t timestamp;
	A* traffic_info;
	Publish_Temp_traffic_Info() {
		timestamp = 0;
		weight = 0.0;
		traffic_info = NULL;
		St_Deviation = 0.0;
	}
} ST_PUBLISH_MEM_TRFINFO;

typedef tbb::concurrent_hash_map<int, ST_PUBLISH_MEM_TRFINFO> THASH_PUBLISH_MEM_INFO;

void getMemStatus()
{
	struct mallinfo info = mallinfo ();
	printf("arena = %d\n", info.arena);
	printf("ordblks = %d\n", info.ordblks);
//	printf("smblks = %d\n", info.smblks);
//	printf("hblks = %d\n", info.hblks);
//	printf("hblkhd = %d\n", info.hblkhd);
//	printf("usmblks = %d\n", info.usmblks);
//	printf("fsmblks = %d\n", info.fsmblks);
	printf("uordblks = %d\n", info.uordblks);
	printf("fordblks = %d\n", info.fordblks);
	printf("keepcost = %d\n", info.keepcost);
	printf("==========================\n", info.keepcost);
}

int indexcount = 1000000;
int main() {

	for(int aa=0; aa<10; ++aa) {
		THASH_PUBLISH_MEM_INFO testhash;
		ST_PUBLISH_MEM_TRFINFO info;

		std::pair<int, ST_PUBLISH_MEM_TRFINFO> kay_value;
		kay_value.second = info;
		for(int loop=0;loop<indexcount;loop++) {
			{
				kay_value.first = loop;
				assert(NULL != (kay_value.second.traffic_info = new A));
				THASH_PUBLISH_MEM_INFO::accessor acc;
				if (testhash.insert(acc,kay_value))
				{
	//              do nothing
				}
				else
				{
					printf("kay already exist\n");
				}
			}
		}

	    getMemStatus();
		sleep(0.1);

		for(int loop =0; loop < indexcount;loop++){
			THASH_PUBLISH_MEM_INFO::accessor acc;
			bool iRet = testhash.find(acc, loop);
			if (!iRet){
				printf("kay no found\n");
				continue;
			}
			if(NULL != acc->second.traffic_info){
//				printf ("%p\n",acc->second.traffic_info);
				delete (acc->second.traffic_info);
				acc->second.traffic_info = NULL;
			}
			testhash.erase( acc );
		}
//		testhash.clear();
		getMemStatus();
		sleep(0.2);

	}

  	return 1;
}

we can still just see memory release to heap only(debug stop at sleep line, see mallinfo output),

but if we uncomment the line of

//		testhash.clear();

we could see memory release to system(both mallinfo and free -m command) at each outside loop,

not known is this test meets your suggestion in

and later we may do some tests of redirect new/delete to TBB's scalable allocator.

 

0 Kudos
RafSchietekat
Valued Contributor III
1,539 Views

I love it when a prediction comes together! -- paraphrasing John "Hannibal" Smith

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,539 Views

I think it is materially consequential if the offending section of the quirk program were placed in a loop, and then if by running the loop you find that the memory consumption accumulates. It is not material as to if the application size go back to the exact state it was before (considering that the scalable allocator caches returned memory). Other than for embedded systems, most O/S's use a page file system. Unused pages in the freed heap and/or scalable allocator slabs are subject to being paged out (so is the rest of the program). The availability of RAM to other programs is more of a factor of the paging system than anything else.

Jim Dempsey

0 Kudos
RafSchietekat
Valued Contributor III
1,539 Views

Not just embedded systems... at least one very successful supercomputer architecture deliberately omits support for multiuser, multitasking, virtual memory, etc.

But I think we've already moved on to the question of a better monitoring mechanism (after finding out why "free -m" doesn't work as expected). Any ideas there?

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,539 Views

Grasping at straws here...

but if we uncomment the line of

//      testhash.clear(); 

we could see memory release to system(both mallinfo and free -m command) at each outside loop

 

The difference I see here is uncommenting (executing clear()) performs a memory touch. Since the "complaint" is about the memory returned to the system (IOW released by the application) this may be a quirk of an interaction of the CRTL heap manager (or other system allocation routine) and the paging manager. IOW, the symptom is the untouched memory is expressly placed into a state of limbo as far as "free -m" is concerned.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,539 Views

You could verify this by peeking into the hash_map object to locate the buffer, then page walk touch buffer location.

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,382 Views

... but this might be a better candidate (I do not have the source code to verify)

IIF the hash_map buffer is lazy-allocated, I suspect it is, then the code may have an error along the line of:

Increment buffer reference count
...
if(buffer == NULL) return; // with bumped reference count
perform wipe
Decriment buffer reference count

This may be an oversight of not using a scoped reference count ++/--

(another straw to grasp)

The above hypotheses could be easily tested by inserting one key, value into the hash table prior to the clear().

If this corrects the behavior, then it would be strong evidence for the above scenario.

Jim Dempsey

0 Kudos
Reply