Re: Can you spot the race condition? (part deux)

Henry_G_Intel · ‎05-13-2003

There was such good feedback for the previous race condition example that I decided to post another common threading error. The program below sometimes reports a correct value for the counter variable even though it contains a race condition. Do you see how to fix it?

#include 
#include 

#define N 4

static int counter = 0;

void* increment (void* arg)
{
   pthread_mutex_t* lock;

   lock = (pthread_mutex_t *)malloc(sizeof(pthread_mutex_t));
   pthread_lock_init (lock, NULL);

   pthread_mutex_lock (lock);
      counter++;
   pthread_mutex_unlock (lock);

   return NULL;
}

int main ()
{
   int j;
   pthread_t tid;

   for (j = 0; j < N; j++)
      pthread_create (&tid, NULL, increment, NULL);

   for (j = 0; j < N; j++)
      pthread_join (tid, NULL);

   printf ("The final count is %d
", counter);
}

If you see the error, be the first to post the solution. If you would prefer to see the example with Win32 threads, let me know and I'll post an equivalent program.

Henry_G_Intel · ‎05-14-2003

Wow, that was quick. You're correct. The mutex must be shared between all threads for mutual exclusion to be effective. In the flawed example code, each thread initialized its own private mutex. Here's one of the two possible solutions that you mentioned:

#include 
#include 
#define N 4
static int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void* increment (void* arg)
{
   pthread_mutex_lock (&lock);
      counter++;
   pthread_mutex_unlock (&lock);
   return NULL;
}
int main ()
{
   int j;
   pthread_t tid;
   for (j = 0; j < N; j++)
      pthread_create (&tid, NULL, increment, NULL);
   for (j = 0; j < N; j++)
      pthread_join (tid, NULL);
   printf ("The final count is %d
", counter);
}

bronx · ‎05-15-2003

> pthread_mutex_lock (&lock);
> counter++;
> pthread_mutex_unlock (&lock);

as you know such code isn't exception safe, IMO it will be way better do define something like


class SafeLock
{
  pthread_mutex_t *mutex;
public:
  SafeLock (pthread_mutex_t &vMutex) : mutex(&vMutex)
    {pthread_mutex_lock(mutex);}}
  ~SafeLock () {pthread_mutex_unlock(mutex);}
};

// typical client :

static pthread_mutex_t GlobalMutex;

void Increment ()
{
  SafeLock lock(GlobalMutex);
  counter++;
}

Henry_G_Intel · ‎05-15-2003

Nice catch! Checking error codes is extremely important in multithreaded programs. The programmer has to protect the shared variable (counter) in the event that pthread_mutex_lock fails. For example, if the mutex is uninitialized, multiple threads could be allowed into the critical section.

By the way, don't forget to initialize your mutex.

Please explain why your example is exception safe. If the lock operation in the constructor fails, it still looks like a data race is possible on the counter variable.

Henry_G_Intel · ‎05-15-2003

I now see how your code is exception safe. In the event of an exception, the increment statement cannot be executed. Very nice.

bronx · ‎05-16-2003

> By the way, don't forget to initialize your mutex.

sure, my code was incomplete in this respect and probably others points too

> Please explain why your example is exception safe. If
> the lock operation in the constructor fails, it still
> looks like a data race is possible on the counter
> variable.

the goal is to be sure to release the mutex (or leave the critical section) when an exception is thrown after the mutex is seized (or CS entered) but before normal termination.

For sure it's a bit overkill for a mere counter increment, but the notation is very simple in client code and the overhead negligible, so it is probably a good idea to use it for all locks

o_neuendorf · ‎05-16-2003

Besides all the comments which were made already I have one little comment to add.
The keyword volatile is a very often overlooked one. But it's very usefull in the context of a global, shared variable like a counter or a flag. It prevents the compiler from doing some optimization, which can lead to strange results in the context of sevaral threads.
Although your example won't get into trouble, a more complicated one could very well suffer. For example, if there would be a loop around counter++.

Regards,
Olaf

bronx · ‎05-16-2003

> The keyword volatile is a very often overlooked one.
> But it's very usefull in the context of a global,
> shared variable like a counter or a flag. It prevents
> the compiler from doing some optimization, which can
> lead to strange results in the context of sevaral
> threads.

Indeed, it's potentially a particularly nasty issue. I remember a very hard bug hunting session with some development in C for TI's C3X DSP targets (ages ago...), after long hours of investigation a single "volatile" fixed all the problems.

Though the ICL 7.1 code generator is quite conservative in this respect, the global is updated (or read again) even if accessed within an inner loop. On a side note, one optimization that I use routinely is to declare explicitely a local copy to enable a register allocation for the transient data.

Instead of using a "volatile" qualifier my personnal favorite is to encapsulate all the data in classes and to access them with member functions, AFAIK it is not legitimate for the code generator to remove the access to the actual data even if the code is inlined.

Anyway, "volatile" doesn't garentee an atomic transaction, how about "volatile _int64 Counter" for IA-32 targets ? The best for guarded data is to use strict encapsulation IMO.

Henry's example will become something like :


class Counter
{
  int n;
public:
  Counter () : n(0) {}
  const int &value () const {return n;}
  void increment () {SafeLock lock(GlobalMutex); n++;}
};