- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I appreciate any help anyone can give. I have been using the cilkplus built-in to the gcc compiler (Red Hat 6.2.1-2) and have had some good results, but there is some strange behaviour I don't understand and could use some pointers as to if I'm doing something wrong. Here is a simple test:
#include <unistd.h> #include <stdio.h> #include <cilk/cilk.h> #include <cilk/cilk_api.h> int dummyfunc(){ printf("This is an empty function "); } int main(){ int p = 12289; int k=11; for(int j=0;j<4;j++){ for(int i=k;i!=1;i*=k); } for(int i=0;i<4;i++) cilk_spawn dummyfunc(); cilk_sync; }
Then I compile like:
<machine>% gcc -fcilkplus -c test.c -o test.o <machine>% g++ -fcilkplus -lcilkrts test.o -o test
And here are some resutls:
<machine>% setenv CILK_NWORKERS 1 <machine>% perf stat ./test This is an empty function This is an empty function This is an empty function This is an empty function Performance counter stats for './test': 12922.468472 task-clock (msec) # 0.998 CPUs utilized 1,308 context-switches # 0.101 K/sec 1 cpu-migrations # 0.000 K/sec 321 page-faults # 0.025 K/sec 38,768,870,903 cycles # 3.000 GHz 30,151,782,133 stalled-cycles-frontend # 77.77% frontend cycles idle 9,083,979,840 stalled-cycles-backend # 23.43% backend cycles idle 21,524,149,285 instructions # 0.56 insns per cycle # 1.40 stalled cycles per insn 4,303,701,602 branches # 333.040 M/sec 48,129 branch-misses # 0.00% of all branches 12.943003310 seconds time elapsed <machine>% setenv CILK_NWORKERS 8 <machine>% perf stat ./test This is an empty function This is an empty function This is an empty function This is an empty function Performance counter stats for './test': 107029.355833 task-clock (msec) # 7.984 CPUs utilized 10,882 context-switches # 0.102 K/sec 32 cpu-migrations # 0.000 K/sec 394 page-faults # 0.004 K/sec 308,709,974,102 cycles # 2.884 GHz 108,096,162,060 stalled-cycles-frontend # 35.02% frontend cycles idle 48,663,214,367 stalled-cycles-backend # 15.76% backend cycles idle 441,641,535,720 instructions # 1.43 insns per cycle # 0.24 stalled cycles per insn 90,218,344,192 branches # 842.931 M/sec 51,095,827 branch-misses # 0.06% of all branches 13.405537268 seconds time elapsed <machine>% setenv CILK_NWORKERS 32 <machine>% perf stat ./test This is an empty function This is an empty function This is an empty function This is an empty function Performance counter stats for './test': 392491.711496 task-clock (msec) # 15.965 CPUs utilized 551,420,816 context-switches # 1.405 M/sec 367 cpu-migrations # 0.001 K/sec 546 page-faults # 0.001 K/sec 1,060,481,304,342 cycles # 2.702 GHz 385,856,059,460 stalled-cycles-frontend # 36.38% frontend cycles idle 232,571,157,589 stalled-cycles-backend # 21.93% backend cycles idle 1,404,659,473,232 instructions # 1.32 insns per cycle # 0.27 stalled cycles per insn 277,478,980,098 branches # 706.968 M/sec 383,200,719 branch-misses # 0.14% of all branches 24.583960051 seconds time elapsed
The machine has 16 threads. The point is that code that has nothing to do with cilk utilizes as much of the CPU as possible up to the number of workers, even when it is in areas of code with no cilk. It didn't seem like this was happening earlier, I can't figure out what is going on.
Any help would be great.
Matthew
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Without looking too deeply at the details, here are some general thoughts:
The compiler generates code to start up the Cilk runtime if it sees any cilk_spawn or cilk_for within the function. Regardless of how much spawning is going on, the runtime will spin up P threads, where P is the value of CILK_NWORKERS. These workers will each saturate a CPU, given the chance, looking for work to do. There is an exponential backoff, but I'm guessing that your program runs too quickly to notice that. Note, however, that if anything else is happening on the machine, the idle workers will yield do that other work. CPU utilization is thus deceptively high on an unloaded computer.
The tradeoff (which we may not have gotten quite right) is between finding and executing work as aggressively as possible and saving energy/keeping CPU utilization small when idle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
void cilk_func(){ for(int i=0;i<4;i++) cilk_spawn dummyfunc(); cilk_sync; } int main(){ int k=11; for(int j=0;j<4;j++) for(int i=k;i!=1;i*=k); cilk_func(); printf("\n"); return 0; }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The latest open-source Cilk runtime alleviates this issue to some degree (not perfectly), so it is worth trying the mainline version of GCC.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page