- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I m not the first one, but I ve a problem of stack overflow with some operations on 'big' matrix defined as pointers (MATMUL(), TRANSPOSE(), allocation of 2 matrix,...), because of contiguous duplicates created in the stack.
I've read most of the previous questions asked about this subject, and I understand that there are many solutions, as incrementing the size of the stack, using DO, and using 'heap arrays' option...
Can you tell mewhat'sthe inconvenience in using 'heap arrays' solution ?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So between the solutions :
1- use devectorisation and imbricate loops for all operations when manipulating big matrix,
and
2- use heap option,
2 seems the best ?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mel,
There is a 3rd option that I use for my Space Elevator simulation. The option is complicated a litte bit due to the fact that OpenMP is involved.
This simulator code is (can be)a memory hog and is very CPU intensive. One of the components within the simulation is a Tether simulated as finite elements of segments and beads (e.g. like a spring with mass at ends). For each bead on the tether (system may have from 8 to many more tethers), about 80 real(8) variables or about 640 bytes of data per bead/segment. A tether may have 10,000 beads (more when makeing high fidelity runs). The number of beads per tether are not the same.
Due to the memory intensiveness of the simulation it is not practicle to maintain a set of scratch temporaries per tether. The route chosen was to have scratch temporaries per thread. Additionaly, some performance enhancements were attained by assigning tethers to threads/processors by way of processor affinity. This means some memory conservation can be attained by having different sized scratch temporaries per thread.
Due to heap allocation having considerable overhead, the scratch temporaries are persistent and dynamically sized if required.
As the simulation begins, when the function requiring the scratch memory is called it calls a function, specifying dimension reqirements, and which function returns a thread dependent pointer to a user typed structure containing pointers to arrays allocated to at least the extent required.
This function call is relatively low weight when the memory previously allocated is sufficient for use with the current requirements. The fast path through the function
get thread number
get pointer(thread number)
test pointer%sizeAllocated
return pointer
The actual code has sanity checks, first time call flag, OpenMP critical section, allocation code if size test fails, etc.. The fast path through is a function call to get the OpenMP thread team member number, get a pointer (integer operation), test an integer, return a pointer. Very low weight when compared to heap allocation.
The only disadvantage of the scheme is you cannot use "size(foo%Array)" to obtain the number of elements in the array that are used.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Eliminating unnecessary temporaries and copies as Steve suggests is important.
You should also examine the data layout, rearrange if necessary, such that the code can use vector (SSEn) instructions. For Fortran you want the fastest varying arrayindex on the left (for C/C++ on the right).
If you have memory available then a simple way to improve performance would be to make the local declaration with SAVE attribute, then on entry check to see if the current array(s) size is(are) sufficient for the current call. If not then deallocate and allocate to new larger size. Once the largest size has been allocated then there will be virtually no overhead.
If you move the array from local declaration to module declaration then clean-up code can run and release unused memory if that is important to you.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page