Software Archive
Read-only legacy content
17061 Discussions

Why does the available number of workers changes execution for a 1 cilk_spawn program?

Tasos_K_
Beginner
529 Views

While optimizing a matrix manipulation code in C, I used CilkPlus to spawn a thread to execute in parallel two functions that are data independent and somewhat computationally intensive. Cilk_spawn is used in only one place in the code as follows:

//(test_function declarations)

cilk_spawn highPrep(d, x, half);

d = temp_0;
r = malloc(sizeof(int)*(half));
temp_1 = r;
x = x_alloc + F_EXTPAD;
lowPrep(r, d, x, half);

cilk_sync;

//test_function return

According to the documentation I have read so far, cilk_spawn is expected to -maybe since CilkPlus does not enforce parallelism- take the highPrep() function and execute it in a different hardware thread if one is available. At the same time it will continue executing the rest of the code including the function lowPrep() until the cilk_sync is reached. At that point the threads sync before the execution proceeds.

The tests are ran on a Xeon E5-2680, dedicated for these experiments. When I change the environment variable CILK_NWORKERS and try values such as 2, 4, 8, 16 the time that the test_function requires to be executed increases as the number of available workers grows larger than 2.

I would expect the available number of threads not to change anything in the execution of this code. I would expect that if 2 threads are available then the function highPrep is executed a thread different than the main. Any thread after that I would expected to remain idle.

Could anyone help in understanding what is going wrong here? 

Thank you in advance.

0 Kudos
1 Solution
Barry_T_Intel
Employee
529 Views

Grows how much?

The "idle" workers aren't idle. They're looking for work to do, which should be accessing the deque of your busy workers. Which might slow things down a little, but shouldn't affect it much.

    - Barry

View solution in original post

0 Kudos
2 Replies
Barry_T_Intel
Employee
530 Views

Grows how much?

The "idle" workers aren't idle. They're looking for work to do, which should be accessing the deque of your busy workers. Which might slow things down a little, but shouldn't affect it much.

    - Barry

0 Kudos
Tasos_K_
Beginner
529 Views

It grows by up to 3-4X when compared with the serialized (no-cilk) version and up to 2X when compared to the P=1 workers version.

I got to the bottom of this, with help from the excellent guide Why is Cilk™ Plus not speeding up my program?. The computational intensity of the spawned thread was small and not enough to hide the spawn overhead. In particular the code spawned had a lifetime of ~30 microseconds, including the additional function call that wrapped the code. This is too small for Cilk to be able to make an impact. Also, the code was different in terms of compiler optimizations especially in regards to autovectorization (this was even more evident when the code was tested on the Intel Xeon Phi). 

During my experimentation with CilkPlus I noticed this and wanted to understand the reason behind my observation. Your answer is exactly what I was looking for. To remove the slowdown I overwrote the CILK_NWORKERS environment variable with a __cilkrts_set_param("nworkers","N") directive before the spawn.

Thank you for you answer,

- Tasos

0 Kudos
Reply