- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The child thread shares a common block with the parent thread, although the parent thread doesn't modify it while the child is active. My trace output indicates that when the problem occurs, the common block that the child reads is incomplete. It's as if the data in the common block is being copied somewhere else in memory for the child (I don't know why it would) but the child starts reading it before the copy is complete. If I put a 100 millisecond sleep at the start of the child thread the problem goes away, but of course that's not a good fix. Any ideas on what is happening here, and how I should fix it?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The data are "being copied somewhere else" -- into multiple locations in the cache hierarchy. But the cache coherency model in the Windows x86 / x64 world should guarantee that this is invisible to software.
Without seeing the code the only thing I know is that the data haven't been flushed into coherent memory before the child thread is eligable for scheduling. We need to see the code to know why. One possibility is that the common area is being initialized, in part, by an optimized copy loop that uses a specialized memory write instruction. That code *should* end with a "fence" instruction to force coherency if that's what's happening. Plus the child thread initiation should involve system calls that would act as fences anyway.
So no, I can't think of a reason why the program as you describe it would act that way on typical hardware. We need more information.
-swn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is likely that you are starting your child thread before you are done initializing the common block.
Try adding a flag at the end of the common block statically initialized to 0.
Not seeing your code (or template of your code) makes it difficult to make an assessment.
You may also be doing something like
main launches
1) GUI app thread
2) child thread
main waits for GUI done
And your GUI code assumes the GUI app thread runs first
And/Or your child thread assumes the GUI app thread runs first
And the data initialization code is in the GUI app thread initialization section prior to it'smessage loop
The assumption that the 1st thread launched is the 1st thread run is false.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I forgot to add, after the GUI app finishes its initialization, set the flag to 1
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim's suggestion is an example of lockless concurrent programming and it appears to be a correct one. It will fix the problem if his guess as to the root cause is correct. It it's something more exotic it may or may not fix or mask the problem.
In detail:
Allocate a memory cell (variable) that is small enough and properly aligned so that it is guaranteed to be updated atomically. Atomicity is not strictly required but it simplifies the analyis.
Initialize it to a known value (zero) before forking; this value signals the slave that the master has not yet completed initialization of the shared data area.
Slave "spins" waiting for the semaphore cell to be updated to asecond agreed-uponvalue, signifying completion of the initialization. The spin loop should include some kind of sleep or yield call so your program will run in a uniprocessor environment in finite time.
Slave now owns the data structure, including the semaphore. When it is finished it may signal this fact to the master either by writing a third agreed-upon value into the semaphore cell or by some other means. The master may poll the semaphore periodically, e.g. on idle events or using a timer.
Updates to the semaphore cell are atomic by construction. The programmer must ensure that these updates are serialized with respect to all other updates to shared data. Unless you are using special cache control instructions this will be the case (on conventional x86 / x64platforms) for any C or C++ code that has sequence points before and after the synchronization mechanism and the shared data is all declared using the "volatile" type modifier. Otherwise, you get what you pay for. Those special instructions exist to improve the performance of things like array initialization so there is a non-zero chance they are in play unless you can rule it out.
I don't know exactly what the equivalent caveats are in fortran but in general the language permits more movement of memory operations (with respect to their apparent source-level ordering) than C and C++ do. The documentation for the mechanism you are using to implement the fork operation should lead you to suitable incantations to ensure serializability of memory access. For OMP the suggestion to mark the common block as shared in the OMP directive sounds like a good idea to me.
-swn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cartman (Eric?:)
When you converted to multi-threaded you effectively moved (changed) the execution sequence. This is no fault of the original designers. You (or whomever enhanced the code) must assume responsibility for the faux pas.
While using compiler option to generate code to check for use of uninitialized variables may help to catch some of the errors it won't catch
1) race conditions where the problem does not show up when you are looking for it, but will show up when you are not looking for it.
2) Situations where a variable is, subsequent to start of application, initialized to a default value. Then subsequently reset by "your parent thread" for use by the "child thread" but where the "child thread" (impatient as they often are) uses the default value instead of the intended value (due to using the value before the parent thread finishes reseting the value).
This is another responsibility the person converting the code must assume.
Often the only way to do this (reliably) is to walk the code and examine every variable and ascertain if there is a sequencing problem.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf
I'm rather enjoying this paper on the problem of grappling mentally with parallelism as implemented in current languages. A bit essoteric for the present discussion but some participants may find it interesting.
-swn

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page