Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Memory leak in OpenMP function omp_init_lock()?

Tim
Beginner
693 Views
Hello everybody,

for efficent lock i want to save my data with the openmp runtime function omp_init_lock() and the corresponding set/unset functions instead of using #pragma omp critical sections[1].

For this reason I need a huge array with initialized omp_lock_t 's. But after calling omp_init_lock() a lot of memory was allocated!

Example:
#include
#include
#define SIZE 1000000
int main(int argc, char** argv){
int i;
omp_lock_t *lock = new omp_lock_t[SIZE];
for (i=0; i omp_init_lock(&lock);
}
delete[] lock;
return 0;
}

The problem here is NOT the new operator, I lost about 300MB of memory after calling omp_init_lock() 1000000 times! And even after deleting all locks again, the memory still seems not to be freed! If I do the same with GNU g++, there is no memory allocated at all.

So, what happens in omp_init_lock()? Is this a bug?


[1] This is described in Intel White Paper: http://software.intel.com/en-us/articles/is-the-free-lunch-really-over-scalability-in-manycore-systems-part-2-using-locks-efficiently/

PS: An similar example can be found in the openMp specification (A.42)
0 Kudos
4 Replies
Dale_S_Intel
Employee
693 Views

I believe the issue is that omp_lock_t doesn't really have a destructor (I think it's a C thing :-). You need to actually call omp_destroy_lock() on each element of the array. Also, I think our locks are padded to cache line sizes to avoid false sharing, so you might find that each lock takes more space than the corresponding gnu lock.

Can you try calling omp_destroy_lock() fore each lock and let me know if that works as expected for you?

Thanks!

Dale

0 Kudos
Tim
Beginner
693 Views
Thanks you for your support, but I dont think thats the problem. You are right, I forgot to call omp_lock_destroy() in this simple example. I tried it again without the dynamic array an with calling the the destroy routine, but the even after calling omp_lock_destroy() i have a leak of 315 MB!

New example:

[cpp]#include 
#include
#define SIZE 1000000
int main(int argc, char** argv){
int i;
omp_lock_t lock[SIZE];
for (i=0; i omp_init_lock(&lock);
}
for(i=0; i omp_destroy_lock(&lock);
}
return 0;
}[/cpp]

I set a breakpoint in line 7,9 and 12. In line 7 the 315MB memory are NOT allocated, in line 9 it is. The problem is that in line 12 the memory is still not freed, although i called omp_lock_destroy()!!!

The point is that I need 5.5 million locks for locking a matrix in our FEM Simulation. This cost 1.7Gig of memory with icpc, but around 0MB (might be more, but not that much!)with GNU g++. As I said, this concept of locking is described in an INTEL White paper, thats why I thought it should work with the INTEL Compiler ;-).

So I guess it is a bug, because of the fact the I lost the memory even after destroying the lock, isn't it?

Cheers,

Tim
0 Kudos
jimdempseyatthecove
Honored Contributor III
693 Views

>>The point is that I need 5.5 million locks for locking a matrix in our FEM Simulation.

If you are using that many locks you need to rework your coding. Zoning the matrix tends to work well. Look at the Blogs section in Parallel Programming, n-Bodies... by Robert Reed. Sample program (earliest post) has Bodies007.cpp

Although this sample program illustrates gravitational interaction between n-Bodies (several ways), it illustrates how you can divide up your FEMeach element to element interaction using virtually no locks and a few synchronization points.

Jim Dempsey
0 Kudos
jimdempseyatthecove
Honored Contributor III
693 Views

Tim,

If you still need the locks you can easily enough roll your own locks, one bit or one byte per lock. Or n locks per thread (e.g. one thread may work on up to 2 elements, 6 elements, n-elements at a time). The lock could be co-resident with a data item in your element. e.g. if you have a mass item you could declare element is locked when mass is negative. This requires 0 bits per lock (but also requires correct coding).

Jim Dempsey
0 Kudos
Reply