Some more general questions

newport_j · ‎03-02-2012

I cilk_spawn -ed a call from my c source main and when I compiled and ran the program it crashed. I went back and saw my error and added on the next line
cilk_sync; it ran pferectly after compiling.

I thougt cilk_syncthis was put in automatically, therefore, why did it crash in the first run?

Also, I have a for loop that has a function call init. Do I gain anything by adding cilk_spawn to itand then on the next line cilk_sync? Should I just leave well enough allone.

Any help appreciated.

Newport_j

Jim_S_Intel · ‎03-02-2012

There should be an implicit "cilk_sync" at the end of every function. I'm not quite sure what error you are seeing --- can you include more details or reproduce the problem with a simpler example?

As far as the for loop, does your example look like the following?

for (int i = 0; i < 10; ++i) {
cilk_spawn g();
cilk_sync;
}

In this case, you don't really gain anything by spawningg() and immediately synching except introducing extra overhead. Also, the cilk_sync in the first iteration will sync any unsynched spawn that might happen before the for_loop.

If there is parallelism inside g() (i.e., g() itself contains a cilk_spawn), then Cilk is still able to exploit paralellism inside g() even when g() is only called from inside the for loop.

Barry_T_Intel · ‎03-05-2012

A cilk_spawn in a loop is not a good implementation. It's important to remember that stealing is expensive.Workstealing works best when you're breaking the work into good-sized chunks so that the work is distributed among the cores quickly and then they can work without having to steal any more.

Consider two implementations. We'll assume that "g()" does an appreciable amount of work, and that we've got 2 cores:

Implementation A:

for (int i = 0; i < 2048; i++)
{
 cilk_spawn g(i);
}
cilk_sync;

Implementation B:

cilk_for (int i = 0; i < 2048; i++)
{
 g(i);
}

In implementation A, you'll spawn the call to g(), and then some other processor will need to steal the continuation. You'll end up stealing all 2048 calls to g(). Steals are expensive, and this will result in a lot of overhead.

Implementation B uses a cilk_for loop which is implementedby a recursive divide-and-conquer algorithm. So the range will be broken into 2, and half will be spawned, and half called. So the first core will spawn half, and the second core will steal the other half. And then both cores will work on their respective tasks without (much) additional overhead. By default, cilk_for will use a grainsize of (range/8P). This gives you good number of tasks (to allow the other cores to help if the workload is uneven) and minimizes the overhead introduced by the spawns.

- Barry

Balaji_I_Intel · ‎03-05-2012

Hello Newport_j,
What compilerand versionare you using?

Thanks,

-Balaji V. Iyer.

newport_j · ‎03-05-2012

I am using the Linux Ubuntu Operating System 11.04 with the icc version 12.1.0 compiler.

Newport_j