- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I cilk_spawn -ed a call from my c source main and when I compiled and ran the program it crashed. I went back and saw my error and added on the next line
cilk_sync; it ran pferectly after compiling.
I thougt cilk_syncthis was put in automatically, therefore, why did it crash in the first run?
Also, I have a for loop that has a function call init. Do I gain anything by adding cilk_spawn to itand then on the next line cilk_sync? Should I just leave well enough allone.
Any help appreciated.
Newport_j
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as the for loop, does your example look like the following?
for (int i = 0; i < 10; ++i) {
cilk_spawn g();
cilk_sync;
}
In this case, you don't really gain anything by spawningg() and immediately synching except introducing extra overhead. Also, the cilk_sync in the first iteration will sync any unsynched spawn that might happen before the for_loop.
If there is parallelism inside g() (i.e., g() itself contains a cilk_spawn), then Cilk is still able to exploit paralellism inside g() even when g() is only called from inside the for loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A cilk_spawn in a loop is not a good implementation. It's important to remember that stealing is expensive.Workstealing works best when you're breaking the work into good-sized chunks so that the work is distributed among the cores quickly and then they can work without having to steal any more.
Consider two implementations. We'll assume that "g()" does an appreciable amount of work, and that we've got 2 cores:
Implementation A:
for (int i = 0; i < 2048; i++) { cilk_spawn g(i); } cilk_sync;
Implementation B:
cilk_for (int i = 0; i < 2048; i++) { g(i); }
In implementation A, you'll spawn the call to g(), and then some other processor will need to steal the continuation. You'll end up stealing all 2048 calls to g(). Steals are expensive, and this will result in a lot of overhead.
Implementation B uses a cilk_for loop which is implementedby a recursive divide-and-conquer algorithm. So the range will be broken into 2, and half will be spawned, and half called. So the first core will spawn half, and the second core will steal the other half. And then both cores will work on their respective tasks without (much) additional overhead. By default, cilk_for will use a grainsize of (range/8P). This gives you good number of tasks (to allow the other cores to help if the workload is uneven) and minimizes the overhead introduced by the spawns.
- Barry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What compilerand versionare you using?
Thanks,
-Balaji V. Iyer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using the Linux Ubuntu Operating System 11.04 with the icc version 12.1.0 compiler.
Newport_j

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page