- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm working on an Fortran application that is using OpenMP fairly extensive to utilize threads. One part of the code is doing quite a lot of ALLOCATE and DEALLOCATE statements in fairly quick successfion for lots of small blocks. All threads will likely be doing the same since it is inside an "!$omp parallel do" loop. I know this may not be ideal but it is currently unavoidable. However, when I profile with Vtune, I see a large time reported for "for_allocate" and "for_deallocate" inside "libifcoremt.a" and by comparison significantly less time in the actual libc allocation routines - order of magnitude difference pretty much - which is not what I would expect. I was wondering if anyone else has seen this sort of behaviour?
A little digging further with vtune suggests almost all of the time sits with a single memory read instruction and before that a write to an adjascent location. My suspicion is that if all threads are doing this, this constitutes classic false sharing and the cache line is bouncing between caches hence a delay. Checking the symbols, looks like it probably does something like:
for__protect_cm_ops = 0; if (for__protect_signal_ops == 1) { ...
Does this sound plausible? And if so, is there anyway I can work around this? I assume these are flags for something?
Thanks, Andy.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your allocate and deallocate have an effect like internal critical regions, and they can be all the same (sufficiently small) size, would it help to allocate outside the parallel region and set a private designation so each thread gets a copy?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alternative:
type(Node), allocatable, target :: Nodes(:)
type(Node), pointer :: aNode
...
! Nodes not allocated here
! Use FIRSTPRIVATE here (issue with PRIVATE and unallocated array)
!$omp parallel FIRSTPRIVATE(Nodes), PRIVATE(i, iNode, nNodes, aNode, ...)
nNodes = WorstCaseNumberOfNodesForThisThread()
allocate(Nodes(nNodes))
iNode = 0
!$omp do
DO I=1,Whatever
! replace allocate(aNode)
iNode = iNode + 1
if(iNode .gt. nNodes) STOP ! fix code
aNode => Nodes(iNode)
! Now use aNode as you did before
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page