Re: Future Class

mwhenson · ‎01-20-2009

How might you write a Future class using TBB? ie

template
class tbb_future
{
public:
tbb_future(T (*f)()); //call the passed function
T get_value() //get the value and block until computation is finished
}

ideally get_value will not just spin and can be called by multiple threads...

Mike

RafSchietekat · ‎01-21-2009

It should be fairly straightforward to use tasks and dependencies, if your program will tolerate performance-wise that later callers of get_value() will be blocked until the result has been evaluated (not being able to do anything useful in the meantime), unless somebody could remind me how to simulate diamond dependencies with the current state of TBB. Do you want to try that without knowing in advance that it will give a satisfying result?

mwhenson · ‎01-21-2009

Quoting - Raf Schietekat

It should be fairly straightforward to use tasks and dependencies, if your program will tolerate performance-wise that later callers of get_value() will be blocked until the result has been evaluated (not being able to do anything useful in the meantime), unless somebody could remind me how to simulate diamond dependencies with the current state of TBB. Do you want to try that without knowing in advance that it will give a satisfying result?

It's actually harder (at least for me) than it initially appears. In example, the following deadlocks if too many tasks call get_value() before completed. Any better ideas than those below?

[cpp]template
class tbb_future
{
public:
   tbb_future(T (*f)())
   {
      tbb::task*t = new(tbb::task::allocate_root())tbb::empty_task;

      root = t;

      root->set_ref_count(2);

      tbb::task*future = new(root->allocate_child())future_task(f,&value);

      root->spawn(*future);

   }

   T get_value()
   {
      if(root)
      {
         tbb::spin_mutex::scoped_lock lock(myMutex);
         if(root)
         {
            root->wait_for_all();
            root = 0;
            lock.release();
         }
      }
      return value;
      
   }
private:
   tbb::atomic<:TASK>root;
   T value;
   tbb::spin_mutex myMutex;

   class future_task : public tbb::task
   {
   public:
      future_task(T (*f)(),T*val):func(f),value(val)
      {

      }

      virtual tbb::task*execute()
      {
         *value = func();
         return 0;
      }

      virtual ~future_task(){}
   private:
      T(*func)();
      T*value;
   };
   friend future_task;
};[/cpp]

This is the only implementation I could come up with that works:

[cpp]template
class tbb_future
{
public:
   tbb_future(T (*f)())
   {
      done = false;
      tbb::task*future = new(tbb::task::allocate_root())future_task(f,&queue);
      future->spawn(*future);

   }

   T get_value()
   {
      if(done)
         return value;

      T t;
      queue.pop(t);
      if(!done)
      {
         value = t;
         done = true;
      }
      queue.push(t);
      return value;
   }
private:
   tbb::concurrent_queue queue;
   tbb::atomicdone;
   T value;

   class future_task : public tbb::task
   {
   public:
      future_task(T (*f)(),tbb::concurrent_queue*q):func(f),queue(q)
      {

      }

      virtual tbb::task*execute()
      {
         queue->push(func());
         return 0;
      }

      virtual ~future_task(){}
   private:
      T(*func)();
      tbb::concurrent_queue*queue;
   };
   friend future_task;
};[/cpp]

RafSchietekat · ‎01-21-2009

The first implementation looks like what I had in mind. I would not use a spin_mutex, though, because it is assumed that the future is heavy enough that a task-based future will be beneficial, so tbb::mutex seems more appropriate. It looks that root is leaked just after wait_for_all(); lock.release() is merely a distraction. What happens if you destroy root afterwards, or instead use spawn_root_and_wait_for_all(root)? I haven't looked at the second implementation yet.

It also occurred to me (only just now, sorry for that) that the code might be calling get_value() at a greater depth than where future_task was spawned, which might cause a deadlock (anyone from the TBB team?). Maybe you can trace some depth values and see whether that provides a clue, before that possibility is explored any further.

ARCH_R_Intel · ‎01-21-2009

Doesn't the version using a concurrent_queue deadlock if there are no worker threads? E.g. the scheduler is initialized with "tbb::task_scheduler_init init(1);" In general, there has to be calls to one of the "wait" methods of class task to ensure progress in the single-threaded case.

We looked at implementing futures in the original version of TBB. The problem was that all the efficient implementations deadlocked for interesting cases where futures used other futures. The root problem is that futures, as typically specified, leave it to the system to figure out an evaluation order, and there are subtle conflicts between "figure it out" and the way task stealing can trap a partially evaluated task on a thread's stack.

RafSchietekat · ‎01-21-2009

Interesting... now what if get_value() could steal a future_task's work while that hasn't started executing, i.e., not the wimpish "stealing" of a task that has actually been carefully made available by the "victim", but really stealing something that both parties want to get to first? The work wouldn't be passed to the future_task when it is created, but the future_task would have to go back to the future :-) to get its assignment, if it is the first, otherwise some other code calls get_value() first and that way starts working on the assignment, which it stole from under the nose of the future_task. Wouldn't that dynamic spawning at least significantly decrease the opportunities for deadlock? What obstacles would remain (barring situations where futures are caught in a deadly embrace)?

mwhenson · ‎01-21-2009

Quoting - Arch Robison (Intel)

Doesn't the version using a concurrent_queue deadlock if there are no worker threads? E.g. the scheduler is initialized with "tbb::task_scheduler_init init(1);" In general, there has to be calls to one of the "wait" methods of class task to ensure progress in the single-threaded case.

We looked at implementing futures in the original version of TBB. The problem was that all the efficient implementations deadlocked for interesting cases where futures used other futures. The root problem is that futures, as typically specified, leave it to the system to figure out an evaluation order, and there are subtle conflicts between "figure it out" and the way task stealing can trap a partially evaluated task on a thread's stack.

Hmmm, interesting. I'm not sure I understand exactly, why does the system have to figure out the evaluation order?

Here's a solution that I thought would work, based on the Two Mouths example from the TBB book. But it claims that deadlock is detected if two tasks call get_value(), when I do a tbb_schedule_init(1). And it sometimes deadlocks at wait_for_all() when I do a tbb_schedule_init(2) - and future_task::execute() never executes! Do you know what's going on here? I must have some misconception of TBB... I've appreciated the comments so far. Thanks!

[cpp]template
class tbb_future
{
public:
   tbb_future(T (*f)())
   {
      done = false;
      future = new(tbb::task::allocate_root())future_task(f,this);
      future->spawn(*future);
   }

   T get_value()
   {
      if(!done)
      {
         cout << "acquiring" << endl;
         tbb::spin_mutex::scoped_lock lock(mutex);
         cout << "acquired" << endl;
         tbb::task*temp_root = 0;
         if(!done)
         {
            temp_root = new (tbb::task::allocate_root())tbb::empty_task;
            tbb::empty_task*child = new (temp_root->allocate_child())tbb::empty_task;

            temp_root->set_ref_count(2);

            future->add_task(child);


         }
         lock.release();
         if(temp_root)
         {
            //what happens if child task is spawned before this line is run?
            cout << "waiting" << endl;
            temp_root->wait_for_all();
         }
      }
      return value;
   }
private:


   void finish(const T&v)
   {
      tbb::spin_mutex::scoped_lock lock(mutex);
      value = v;
      done = true;
   }

   class future_task : public tbb::task
   {
   public:
      future_task(T (*f)(),tbb_future*fut):func(f),future(fut)
      {

      }

      virtual tbb::task*execute()
      {
         cout << "executing" << endl;
         future->finish(func());

         cout << "finished!" << endl;

         for(size_t i=0;i < dependent_tasks.size();++i)
            spawn(*dependent_tasks);

         return 0;
      }

      void add_task(tbb::task*t)
      {
         dependent_tasks.push_back(t);
      }


      virtual ~future_task(){}
   private:
      T(*func)();
      tbb_future*future;
      tbb::concurrent_vector<:TASK>dependent_tasks;
   };
   tbb::atomicdone;
   T value;
   tbb::spin_mutex mutex;
   tbb::atomicfuture;
   
   friend future_task;
};[/cpp]

Here's now I run it:

[cpp]class get_future : public tbb::task
{
public:
   get_future(tbb_future&f):fut(f)
   {
   }

   tbb::task*execute()
   {
      cout << "getting value" << endl;
      
      cout << "got value: " << fut.get_value() << endl;
      return 0;
   }

   tbb_future&fut;
};

int func()
{
   cout << "sleeping" << endl;
   Sleep(2000);
   cout << "sleeping" << endl;
   Sleep(2000);

   return 4;
}

void main()
{
tbb::tbb_scheduler_init init;[/cpp]

[cpp]  tbb::task*s;

   tbb_futurefut(func);
   s = new (tbb::task::allocate_root())get_future(fut);s->spawn(*s);
   s = new (tbb::task::allocate_root())get_future(fut);s->spawn(*s);
   s = new (tbb::task::allocate_root())get_future(fut);s->spawn(*s);
   s = new (tbb::task::allocate_root())get_future(fut);s->spawn(*s);
   s = new (tbb::task::allocate_root())get_future(fut);s->spawn(*s);

   Sleep(100);

   int val = fut.get_value();

   std::cout << endl << val << endl;[/cpp]

}

Thanks!

RafSchietekat · ‎01-22-2009

Again, I see a task being leaked (temp_root). The name "future" for a pointer to a "future_task" is a bit confusing (I would use m_future_task myself). I didn't think of this possibility for a worker to be able to do useful work while waiting for a resultfuture's value, but I still believe that you may run into problems by pinning all your hope on the future_task, which may be caught up in an unfortunate entanglement (see my earlier suggestion).

(Added) (Meanwhile, I saw and responded to Gunjan Rawal's request.)

mwhenson · ‎01-22-2009

Quoting - Raf Schietekat

Again, I see a task being leaked (temp_root). The name "future" for a pointer to a "future_task" is a bit confusing (I would use m_future_task myself). I didn't think of this possibility for a worker to be able to do useful work while waiting for a resultfuture's value, but I still believe that you may run into problems by pinning all your hope on the future_task, which may be caught up in an unfortunate entanglement (see my earlier suggestion).

(Added) (Meanwhile, I saw and responded to Gunjan Rawal's request.)

Hmm you're right about the task leaking. What suggestion are you referring to, checking the task_depth? If I understand correctly, that shouldn't matter in this case at all? Shouldn't wait_for_all() be able to execute any task with a ref_count of 0? What am I missing? Thanks!

Mike

RafSchietekat · ‎01-22-2009

If you create a future in task A and task A subsequently spawns B which does get_value() before the future_task has started executing, then you'll have a deadlock because the thread that's waiting will not consider the future_task as an option for keeping busy (maybe you'll need some more distance, but you get the picture). This situation would require another worker thread (and TBB is about optional concurrency, so everything should work with just one thread), that isn't caught up in work at a greater depth than future_task either. So there are many ways for this to cause problems, and this seems clear enough now not to need verification anymore.

Hence my suggestion to give a future_task a chance to create concurrency, but to take the assignment away from it if it hasn't started work by get_value() time. It's like a workaround for not being able to dynamically update a directed acyclic graph of dependencies (more than one places in the code may call get_value()) and get the future_task respawned at a more appropriate depth (if that concept then still holds).

ARCH_R_Intel · ‎01-22-2009

I don't remember the exact sequence that caused the hang problems we saw, but I do remember the example. The example was computing the binomial coefficient B(n,k) recursively; i.e., by Pacal's triangle.

Construct a future for each B(i,j) required for the recursive computation of B(n,k). It works out to a rectangular subregion of Pascal's triangle. Don't worry about arithmetic wrap-around. E.g., actually compute B(n,k) mod 2^32.
Link up the futures so that evaluation of B(i,j) computes 1 if j=0 or i==j. Otherwise compute it as spawning the the futures for B(i-1,j-1) and B(i-1,j) and adding their result. The key point is that there should be only one future created and evaluated for each B(i,j). The dependence graph becomes rectangular grid.
Start the future for B(n,n/2) for a suitably high value of n.

I recall the hang problem was related to the way that a thread working on afuture X could end up waiting for another future Y to complete, and to keep itself busy, steal another future Z that ended up needing the result of X.

RafSchietekat · ‎01-22-2009

This is not a situation I would have considered. But doesn't TBB preclude stealing tasks at the same depth or shallower? So wouldn't the computation proceed in the order dictated by the order of spawning, except for a window the size of the number of workers, and wouldn't this either result in all dependencies being resolved or soon resolved, or hopelessly stuck waiting for futures that have not started executing, in which case my suggestion could come to the rescue? (Sorry if there any obvious holes in this reasoning, I'm stuck in fact-finding mode for now.)

mwhenson · ‎01-22-2009

Quoting - Raf Schietekat

If you create a future in task A and task A subsequently spawns B which does get_value() before the future_task has started executing, then you'll have a deadlock because the thread that's waiting will not consider the future_task as an option for keeping busy (maybe you'll need some more distance, but you get the picture). This situation would require another worker thread (and TBB is about optional concurrency, so everything should work with just one thread), that isn't caught up in work at a greater depth than future_task either. So there are many ways for this to cause problems, and this seems clear enough now not to need verification anymore.

No, I don't understand. Where is this information in the TBB Documents? The documentation for the method wait_for_all() says nothing about not being able to task steal from other threads, nor can I find anything that suggests these conditions do not allow task stealing from another thread. Thanks again, I have another idea I'm working on, we'll see...

mwhenson · ‎01-22-2009

Quoting - Raf Schietekat

This is not a situation I would have considered. But doesn't TBB preclude stealing tasks at the same depth or shallower? So wouldn't the computation proceed in the order dictated by the order of spawning, except for a window the size of the number of workers, and wouldn't this either result in all dependencies being resolved or soon resolved, or hopelessly stuck waiting for futures that have not started executing, in which case my suggestion could come to the rescue? (Sorry if there any obvious holes in this reasoning, I'm stuck in fact-finding mode for now.)

Thanks for your patience. I stole your idea and have the future start a new task if one hasn't started yet. I also fixed the task leaking... This appears to work, though I'm not sure about the case that Arch mentions, nor would I know how to force a test of that case. Thanks for any comments!

[cpp]template
class tbb_future
{
public:
   tbb_future(T (*f)()):func(f)
   {
      started = false;
      done = false;
      my_future = new(tbb::task::allocate_root())future_task(f,this);
      my_future->spawn(*my_future);
   }

   T get_value()
   {
      if(!done)
      {
         tbb::spin_mutex::scoped_lock lock(mutex);
         tbb::task*temp_root = 0;
         if(!done)
         {
            if(started)
            {
               temp_root = new (tbb::task::allocate_root())tbb::empty_task;
               tbb::empty_task*child = new (temp_root->allocate_child())tbb::empty_task;

               temp_root->set_ref_count(2);

               my_future->add_task(child);
               lock.release();

            }
            //it's never even started, but may have a different owner and never be able to.
            //recreate a new task and wait on it
            else
            {
               my_future = new(tbb::task::allocate_root())future_task(func,this);
               lock.release();
               tbb::task::spawn_root_and_wait(*my_future);
            }

         }
         if(temp_root)
         {
            //what happens if child task is spawned before this line is run?
            temp_root->wait_for_all();
            temp_root->destroy(*temp_root);
         }
      }
      return value;
   }
private:

   bool start()
   {
      tbb::spin_mutex::scoped_lock lock(mutex);
      bool ret = started;
      started = true;
      return ret;
   }

   void finish(const T&v)
   {
      tbb::spin_mutex::scoped_lock lock(mutex);
      value = v;
      done = true;
   }

   class future_task : public tbb::task
   {
   public:
      future_task(T (*f)(),tbb_future*fut):func(f),my_future(fut)
      {

      }

      virtual tbb::task*execute()
      {
         //if there's another task that already did / is doing this computation, just return
         if(!my_future->start())
         {
            my_future->finish(func());

            for(size_t i=0;i < dependent_tasks.size();++i)
               spawn(*dependent_tasks);
         }

         return 0;
      }

      void add_task(tbb::task*t)
      {
         dependent_tasks.push_back(t);
      }


      virtual ~future_task(){}
   private:
      T(*func)();
      tbb_future*my_future;
      tbb::concurrent_vector<:TASK>dependent_tasks;
   };
   tbb::atomicdone;
   tbb::atomicstarted;
   T value;
   tbb::spin_mutex mutex;
   tbb::atomicmy_future;
   
   T(*func)();
   friend future_task;
};[/cpp]

RafSchietekat · ‎01-23-2009

How about direct invocation instead of the extra future_task in get_value()?

mwhenson · ‎01-26-2009

Quoting - Raf Schietekat

How about direct invocation instead of the extra future_task in get_value()

Hmmm, good suggestion, but I think this is necessary so that subsequent callers of get_value() can add a dependent_task to know when the computation is done.

So this future class is guarenteed to work except when it depends on another future class, where it could (due to task stealing) be stuck waiting for itself? This seems to be a very subtle potential for race conditions in a task stealing scheduler that I would never have considered...

RafSchietekat · ‎01-26-2009

"Hmmm, good suggestion, but I think this is necessary so that subsequent callers of get_value() can add a dependent_task to know when the computation is done." Wouldn't that still work if you directly call execute()?

"stuck waiting for itself" I don't know... The example provided by Arch didn't seem to apply to TBB as it is now, or does it? Maybe it's not too difficult to formulate and follow some workable rules to make a future a useful Threading Building Block?

mwhenson · ‎01-26-2009

Quoting - Raf Schietekat
"stuck waiting for itself" I don't know... The example provided by Arch didn't seem to apply to TBB as it is now, or does it? Maybe it's not too difficult to formulate and follow some workable rules to make a future a useful Threading Building Block? this is necessary so that subsequent callers of get_value() can add a dependent_task to know when the computation is done.

Ahh I thought you meant call func() directly... yeah calling execute should work, good idea I hadn't considered that.

I really don't know whether it will work with TBB as it is now... that's why I was hoping to find out from the forum; but it's also important that it works as TBB will be...

mwhenson · ‎01-26-2009

Quoting - mwhenson

Ahh I thought you meant call func() directly... yeah calling execute should work, good idea I hadn't considered that.

I really don't know whether it will work with TBB as it is now... that's why I was hoping to find out from the forum; but it's also important that it works as TBB will be...

Also, .NET has (will have) the Task Parallel Library which is a work stealing library with futures. So it must be possible to implement futures with a work stealing scheduler. Anyone know how they do it?

Mike

mwhenson · ‎01-26-2009

What we really need is a way to create a task that is guarenteed not to steal... ie a task constructor with some do_not_steal option that if set to true would never steal on a wait_for_xxx method. This would make my code correct (future_task would call the constructor with this option), though we wouldn't be able to use Raf's direct invocation optimization. This may lower some potential parallism, but if the other side is potential deadlock, this would be worth it. What do you think?

RafSchietekat · ‎01-26-2009

"calling execute should work, good idea" I actually consider it mandatory to avoid deadlock that the caller take responsibility for executing the code itself, or verifying that it is being executed, not just spawned.

"So it must be possible to implement futures with a work stealing scheduler." Elementary, my dear Henson! Surely everybody knows the Coffman conditions for deadlock... well, mutual exclusion corresponds to get_value() waiting until somebody else comes up with the answer, so if instead you just go ahead anyway (or wait only after verifying that some other code is actually working on the future), not caring that you might be doing redundant work or sitting idle, you will avoid inducing deadlock.

"What we really need is a way to create a task that is guarenteed not to steal..." Not really: TBB will only steal deeper tasks, so that takes care of one of the other Coffman conditions.

Well, I'm not 100% sure yet, but it seems plausible.

P.S.: Actually, Holmes never said that.