- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I appreciate any help anyone can give. I have been using the cilkplus built-in to the gcc compiler (Red Hat 6.2.1-2) and have had some good results, but there is some strange behaviour I don't understand and could use some pointers as to if I'm doing something wrong. Here is a simple test:
#include <unistd.h>
#include <stdio.h>
#include <cilk/cilk.h>
#include <cilk/cilk_api.h>
int dummyfunc(){
printf("This is an empty function ");
}
int main(){
int p = 12289; int k=11;
for(int j=0;j<4;j++){
for(int i=k;i!=1;i*=k);
}
for(int i=0;i<4;i++) cilk_spawn dummyfunc();
cilk_sync;
}
Then I compile like:
<machine>% gcc -fcilkplus -c test.c -o test.o <machine>% g++ -fcilkplus -lcilkrts test.o -o test
And here are some resutls:
<machine>% setenv CILK_NWORKERS 1
<machine>% perf stat ./test
This is an empty function This is an empty function This is an empty function This is an empty function
Performance counter stats for './test':
12922.468472 task-clock (msec) # 0.998 CPUs utilized
1,308 context-switches # 0.101 K/sec
1 cpu-migrations # 0.000 K/sec
321 page-faults # 0.025 K/sec
38,768,870,903 cycles # 3.000 GHz
30,151,782,133 stalled-cycles-frontend # 77.77% frontend cycles idle
9,083,979,840 stalled-cycles-backend # 23.43% backend cycles idle
21,524,149,285 instructions # 0.56 insns per cycle
# 1.40 stalled cycles per insn
4,303,701,602 branches # 333.040 M/sec
48,129 branch-misses # 0.00% of all branches
12.943003310 seconds time elapsed
<machine>% setenv CILK_NWORKERS 8
<machine>% perf stat ./test
This is an empty function This is an empty function This is an empty function This is an empty function
Performance counter stats for './test':
107029.355833 task-clock (msec) # 7.984 CPUs utilized
10,882 context-switches # 0.102 K/sec
32 cpu-migrations # 0.000 K/sec
394 page-faults # 0.004 K/sec
308,709,974,102 cycles # 2.884 GHz
108,096,162,060 stalled-cycles-frontend # 35.02% frontend cycles idle
48,663,214,367 stalled-cycles-backend # 15.76% backend cycles idle
441,641,535,720 instructions # 1.43 insns per cycle
# 0.24 stalled cycles per insn
90,218,344,192 branches # 842.931 M/sec
51,095,827 branch-misses # 0.06% of all branches
13.405537268 seconds time elapsed
<machine>% setenv CILK_NWORKERS 32
<machine>% perf stat ./test
This is an empty function This is an empty function This is an empty function This is an empty function
Performance counter stats for './test':
392491.711496 task-clock (msec) # 15.965 CPUs utilized
551,420,816 context-switches # 1.405 M/sec
367 cpu-migrations # 0.001 K/sec
546 page-faults # 0.001 K/sec
1,060,481,304,342 cycles # 2.702 GHz
385,856,059,460 stalled-cycles-frontend # 36.38% frontend cycles idle
232,571,157,589 stalled-cycles-backend # 21.93% backend cycles idle
1,404,659,473,232 instructions # 1.32 insns per cycle
# 0.27 stalled cycles per insn
277,478,980,098 branches # 706.968 M/sec
383,200,719 branch-misses # 0.14% of all branches
24.583960051 seconds time elapsed
The machine has 16 threads. The point is that code that has nothing to do with cilk utilizes as much of the CPU as possible up to the number of workers, even when it is in areas of code with no cilk. It didn't seem like this was happening earlier, I can't figure out what is going on.
Any help would be great.
Matthew
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Without looking too deeply at the details, here are some general thoughts:
The compiler generates code to start up the Cilk runtime if it sees any cilk_spawn or cilk_for within the function. Regardless of how much spawning is going on, the runtime will spin up P threads, where P is the value of CILK_NWORKERS. These workers will each saturate a CPU, given the chance, looking for work to do. There is an exponential backoff, but I'm guessing that your program runs too quickly to notice that. Note, however, that if anything else is happening on the machine, the idle workers will yield do that other work. CPU utilization is thus deceptively high on an unloaded computer.
The tradeoff (which we may not have gotten quite right) is between finding and executing work as aggressively as possible and saving energy/keeping CPU utilization small when idle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
void cilk_func(){
for(int i=0;i<4;i++) cilk_spawn dummyfunc();
cilk_sync;
}
int main(){
int k=11;
for(int j=0;j<4;j++)
for(int i=k;i!=1;i*=k);
cilk_func();
printf("\n");
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The latest open-source Cilk runtime alleviates this issue to some degree (not perfectly), so it is worth trying the mainline version of GCC.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page