- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The weired thing is when I using more than one openmp thread, the large chunk of memory are not freed clearly (not return to the system) . If only using one thread, there is no problem for free.
Another problem is that windows version compiler indicates the /Qopt-malloc-options, I have tried it in vain : I wonder if it's a fake option and can noly be used on Linux or MacOS.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The weired thing is when I using more than one openmp thread, the large chunk of memory are not freed clearly (not return to the system) . If only using one thread, there is no problem for free.
Another problem is that windows version compiler indicates the /Qopt-malloc-options, I have tried it in vain : I wonder if it's a fake option and can noly be used on Linux or MacOS.
The major soruce of these problems are programmer errors.
Check not only your deallocations but your allocations. A particular nasty is allocating into a pointer by multiple threads when the storage location for the pointer isa shared variable. (misplaced {}'s).
Try performing your allocations within a struct/class object where the dtor releases the allocated memory (like the string class). And be sure to place the object within the correct scope.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The major soruce of these problems are programmer errors.
Check not only your deallocations but your allocations. A particular nasty is allocating into a pointer by multiple threads when the storage location for the pointer isa shared variable. (misplaced {}'s).
Try performing your allocations within a struct/class object where the dtor releases the allocated memory (like the string class). And be sure to place the object within the correct scope.
Jim Dempsey
What you mean that allocating by multiple threads ? I am quite sure that I does not use any dynamic allocation in an openmp region. In fact in my program there is very few openmp regions. Shall I using omp_set_num_threads(1) evry time when I allocate/free memory ?
Another trace, I don't have this malloc/free problem on the linux plateform, both mono-theads/multi-threads works correctely. Even more, I know exactely which memory chunk are not freed to the system, because such a big chunk are very rarely used in the program.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you want each thread to have their own array
double* array = 0; // *** bad, pointer in wrong scope
// ok to do this when shared(array) on pragma
#pragma omp parallel
{
array = new double[count]; // *** bad all threads sharing same pointer
// *** 2nd and later threads overwrite pointer
...
delete [] array; // *** 2nd and later threads returning same memory
}
------------------------------------
#pragma omp parallel
{
double* array = 0;
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}
--------------------
double* array = 0; // OK because of private(array) on pragma
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}
--------------------
double* array = 0;
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
}
delete [] array; // ***badmain thread returning one copy
There is nothing wrong with new/delete inside parallel regions, in fact it may be required when you want each thread to have seperate data (e.g. for temporary arrays).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you want each thread to have their own array
double* array = 0; // *** bad, pointer in wrong scope
// ok to do this when shared(array) on pragma
#pragma omp parallel
{
array = new double[count]; // *** bad all threads sharing same pointer
// *** 2nd and later threads overwrite pointer
...
delete [] array; // *** 2nd and later threads returning same memory
}
------------------------------------
#pragma omp parallel
{
double* array = 0;
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}
--------------------
double* array = 0; // OK because of private(array) on pragma
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}
--------------------
double* array = 0;
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
}
delete [] array; // ***badmain thread returning one copy
There is nothing wrong with new/delete inside parallel regions, in fact it may be required when you want each thread to have seperate data (e.g. for temporary arrays).
Jim Dempsey
Problem I obeseved comes from two big arrays allocated in non-parallel zone (which have never been used in any parallel region). The free() does not return these memory chunks to system immediately (or memory heaps become very fregement) then the program will stop later by lack of memory if I want to allocated another big array.
I have experimented the very similar problem before on the AIX plateform and has been suggested to use discliam() function to "declare" and "return" these memory chunks to system. On linux system, one can use mallopt() function to reset some memory allocation parameters to reproduce the same symptomes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Problem I obeseved comes from two big arrays allocated in non-parallel zone (which have never been used in any parallel region). The free() does not return these memory chunks to system immediately (or memory heaps become very fregement) then the program will stop later by lack of memory if I want to allocated another big array.
I have experimented the very similar problem before on the AIX plateform and has been suggested to use discliam() function to "declare" and "return" these memory chunks to system. On linux system, one can use mallopt() function to reset some memory allocation parameters to reproduce the same symptomes.
When you have a single threaded application that is tight on memory (as you suggest your application is)
Then when you use OpenMP or any threading tool, each thread is going to be instantiated with its own stack. The low end of these stack size can be a few MB, but you can set the stack size to small or large. Default is stack size of main thread. You might want to experiment with adjusting the stack size for the non-main thread.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you have a single threaded application that is tight on memory (as you suggest your application is)
Then when you use OpenMP or any threading tool, each thread is going to be instantiated with its own stack. The low end of these stack size can be a few MB, but you can set the stack size to small or large. Default is stack size of main thread. You might want to experiment with adjusting the stack size for the non-main thread.
Jim Dempsey
Thanks a lot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm gonna try it...
I've just tried it, (with 2M and 2 openMP threads), the problem persist.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm gonna try it...
I've just tried it, (with 2M and 2 openMP threads), the problem persist.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Another potential problem is at what point in your application the OpenMP thread pool is established.
IIF thread pool is established on the 1st entry into the 1st parallel region AND IFF that region is deep in your code, the new stack spaces might be allocated at some midpoint in your allocations, thus potentially causing some undesired fragmentation with your heap. An easy way to fix this is to insert a parallel region just after entry to main which does something that does not get optimized out
#pragma omp parallel
{
if(omp_get_thread_num() < 0) exit();
}
You may need to trace your allocations/deallocations to find the problem and/or insert some well crafted _ASSERT
YourAllocatorAssumesSerial(...)
{
_ASSERT(omp_in_parallel() == 0);
// now allocate...
...
}
And you may need code to check for leaks and/or allocations when not required
// static pointer
double* array = NULL;
...
YourAllocationRoutine()
{
_ASSERT(array==NULL);
array = new double[yourSize];
...
}
If each thread mistakenly called the allocation routine you would have a leak.
See what you can do to reduce your footprint. (maybe optimize for reduced size)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Another potential problem is at what point in your application the OpenMP thread pool is established.
IIF thread pool is established on the 1st entry into the 1st parallel region AND IFF that region is deep in your code, the new stack spaces might be allocated at some midpoint in your allocations, thus potentially causing some undesired fragmentation with your heap. An easy way to fix this is to insert a parallel region just after entry to main which does something that does not get optimized out
#pragma omp parallel
{
if(omp_get_thread_num() < 0) exit();
}
You may need to trace your allocations/deallocations to find the problem and/or insert some well crafted _ASSERT
YourAllocatorAssumesSerial(...)
{
_ASSERT(omp_in_parallel() == 0);
// now allocate...
...
}
And you may need code to check for leaks and/or allocations when not required
// static pointer
double* array = NULL;
...
YourAllocationRoutine()
{
_ASSERT(array==NULL);
array = new double[yourSize];
...
}
If each thread mistakenly called the allocation routine you would have a leak.
See what you can do to reduce your footprint. (maybe optimize for reduced size)
Jim Dempsey
3rd test can be interesting, I can always perserve a few static pointers for the big array, it may change something.
First test is just to ensure the abnormal behavoir, isn't it?
Thanks again for the precious advices.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Om
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lanzors,
By using _ASSERT(expression) the code only expands in Debug build. So your Release build has no overhead.
However, as much as you try to keep your allocations under control, when you hand this code off to someone else to support, they may not be as careful as you are. The _ASSERT is in there to catch for these types of potential errors (now or in the future). You should get in the habit of using _ASSERT throught your code to test for all kinds of errors, principly argument checking, but in some places results checking, or convergence problems testing.
Also, you might try setting "Low Fragmentation Heap"
See MS C++ help on
heap functions | HeapSetInformation
Then once you read that, follow link to "Low Fragmentation Heap"
From MS C++ Help
[cpp]The following example shows you how to enable the low-fragmentation heap. #include#include void main() { ULONG HeapFragValue = 2; if(HeapSetInformation(GetProcessHeap(), HeapCompatibilityInformation, &HeapFragValue, sizeof(HeapFragValue)) ) { printf("Success!n"); } else printf ("Failure (%d)n", GetLastError()); } [/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, I totally agree with you about the assertion purpose and thank again for your new advice about MS heap funtions: I will look it around for the solution.
Om, I have already made a very little test but it can not reproduce the problem, I doubt it depends on the complexity of the allocation scheme in the program. Any way, I shall try it again if I can not find out the solution.
Because I do have other jobs to do for now, so I have just fixed this problem by a work-out : before allocate a big array, do a simple estimation with a loop of malloc/free... But surely I will come back to this problem and keep everybody knows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sorru for disturbance
i need to ask a question related to open mp
i am running a program that takes a layout as an input and do some computation then write the output in a text file
if the layout is small open mp works well and produce output as running code serial , but if the layout is large it does not produce the same output as running code in series and also if run more than one each time it produces different output and the problem is not racing because as i said in a small layout it works perfect .
Moreover , if i changed the code for large layout to make it make less computations on it ,output produces is similar to series.
I really want to know what is the problem. Is it that openmp does not support large data size.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
amr,
Please repost this as a new topic. Tacking it on to a thread that is 8 years old may get little attention.
There are generally two circumstances to be aware of when coding parallel:
1) Accumulation of floating point round off errors may be different when the accumulation is performed in different order or on sub-sets and then subsequently reduced to a single result.
2) When the multiple threads perform output, the output will not necessarily be in the same order as when performed serially.
Your small layout may work by chance.
In a parallel program, the multiple threads do not start your compute section instantaneously nor operate in lock-step. If the code section is small and has relatively few places of (potential) race conditions then the probability of actual race condition is low. As the code in the parallel region gets larger, and the occurrence of (potential) race conditions increase, then the probability of race condition increases.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page