Software Archive
Read-only legacy content
17061 Discussions

Race conditions in cilk_for

Jose_F_
Beginner
828 Views

Hi everyone,

I'm trying to run a code using cilkplus. The idea is to initiallize an array in parallel, using cilk_for, and then read it, in parallel again, to compute some results. However, I have race conditions between instruction on the two consecutive cilk_for. Above you can find an example code that reproduces that error:

#include <stdlib.h>
#include <stdio.h>

#include <cilk/cilk.h>
#include <cilk/cilk_api.h>
#include <cilk/common.h>

int main(int argc, char** argv) {

  struct sublist_node {
    int head;
    int next;
  };
  
  uint i = 0;
  uint s = 100;
  struct sublist_node* sublist = malloc(s*sizeof(struct sublist_node));
  
  cilk_for(i = 0; i < s; i++) {
    sublist.head = i;
    sublist.next = -1;
  }

  cilk_for(i = 0; i < s; i++) {
    int curr = sublist.next;
  }

  return EXIT_SUCCESS;
}

 

When I run cilk screen I obtain this output

Cilkscreen Race Detector V2.0.0, Build 4421

Race condition on location 0xf0f014
  write access at 0x4007fd: (/home/jose/Dropbox/Doctorado/Succinct Graph/bitbucket/code/race.c:21, __cilk_for_001.2888+0x41)
  read access at 0x40083c: (/home/jose/Dropbox/Doctorado/Succinct Graph/bitbucket/code/race.c:25, __cilk_for_002.2912+0x2b)
    called by 0x4007b0: (/home/jose/Dropbox/Doctorado/Succinct Graph/bitbucket/code/race.c:26, main+0xe3)
1 error found by Cilkscreen
Cilkscreen suppressed 62 duplicate error messages

I'm using gcc version 4.9.0 20130520 (experimental) (GCC).  

It would be nice if anyone can give me any hint.

Thanks,

José Fuentes.

 

0 Kudos
9 Replies
Abdul__J_
Beginner
828 Views

there is a race condition at line 25 on variable 'curr'. As the iterations are parallel so each thread tries to modify 'curr' that produces data race condition on this variable. Can you clarify more about what you are trying to accomplish?

0 Kudos
Abdul__J_
Beginner
828 Views

If you are just trying to check that whether the values of each 'sublist.next' is initialized properly or not then you can simply read the values one by one and check them by printing each of them. In order to do that there are three ways : a) just simply change 'cilk_for(' to 'for(' b) If you want to run iterations in parallel then you would have to use mutual exclusion locks c) you can make 'curr' a reducer/holder variable so that each thread would have its own separate copy. You can try these options to avoid data race.

0 Kudos
TimP
Honored Contributor III
828 Views

No, curr is thread local, as its scope is entirely inside the cilk_for.

However, i also needs to be thread local, as in

cilk_for(int i = 0; ....

The shared i shown in the code above should be rejected under C++ and warned under C compilation.

This should be no problem, as gcc 4.9 should adhere to the default value of std=gnu99, which supports that syntax. Intel C would require explicit setting of std=c99.

Using an unsigned int as the cilk_for counter and array index is likely to be sub-optimal.  There's not much point in using cilk_for if you take measures to prevent optimization.

0 Kudos
Abdul__J_
Beginner
828 Views

oh yes .. Sorry !! I didn't see that.. In that case it shouldn't be a problem if you just declare 'int i=0;' separately in both cilk_for loops.

0 Kudos
Barry_T_Intel
Employee
828 Views

I agree with Tim that the problem is the shared "i".

One way to debug problems like these is to run the application with one worker under gdb. Then you can use gdb to determine what is at the location that Cilkscreen is calling out -  0xf0f014 in this case.

   - Barry

0 Kudos
Jose_F_
Beginner
828 Views

I modified the declaration of "i". However, the race condition persists.

The modified code is this:

#include <stdlib.h>
#include <stdio.h>

#include <cilk/cilk.h>
#include <cilk/cilk_api.h>
#include <cilk/common.h>

int main(int argc, char** argv) {

  struct sublist_node {
    int head;
    int next;
  };
  
  uint s = 100;
  struct sublist_node* sublist = malloc(s*sizeof(struct sublist_node));
  
  cilk_for(uint i = 0; i < s; i++) {
    sublist.head = i;
    sublist.next = -1;
  }

  cilk_for(uint i = 0; i < s; i++) {
    int curr = sublist.next;
  }

  return EXIT_SUCCESS;
}

And the race conditions detected by cilkscreen is the same. This is just an example code, so, I need the variable 'curr' later. My main concern is that I don't know why I can have race conditions using two consecutive cilk_for, that is, executed sequentially. I suppose that the scope of the first cilk_for is close before the second one start, syncing all the threads. Am I right?

Just to be clear, I'm using the cilk branch of GCC.

Cheers,

0 Kudos
Barry_T_Intel
Employee
828 Views

The compiler extracts the body of the cilk_for loop into a lambda function that is passed into the Cilk runtime for execution which uses a divide-and-conquer algorithm to execute it. So there is always a cilk_sync at the end of a cilk_for loop.

Did you try running the application under gdb with 1 worker and symbolizing the address you're racing on? That will probably give you the clue you need.

   - Barry

0 Kudos
Jim_S_Intel
Employee
828 Views

I don't see a race in the code, and I don't get a reported race on your program above with ICC 15.0 and (probably relatively old) version of cilkscreen.

What optimization level are you compiling at?   Does the "race" appear if you compile at -O0, versus a higher optimization level?   It is conceivable that there could be some optimization pass which is causing something unexpected to happen...

I wasn't aware that cilkscreen works with GCC 4.9, because as far as I know, I think no one has gotten to porting the appropriate metadata from the GCC 4.8 cilkplus branch into the mainline of GCC.    [This issue has come up several times in the last few forum posts.]  I'm not up-to-date on the status of any of the branches though.
Cheers,

Jim

0 Kudos
Bradley_K_
New Contributor I
828 Views

I also was unable to get cilkscreen to report a race when I compiled with gcc 5.1.1 (From Fedora 22).

Like Jim, I don't really expect cilkscreen to do anything reasonable with code generated by gcc.

Is this really the code that you are getting errors on?  You say you need the 'curr' variable later.  Does that mean you use curr after the second cilk_for finishes?

0 Kudos
Reply