How to monitor/manage task pool

chentianran · ‎07-05-2010

Greetings,

My current Pthread-based program uses per-thread task pools with simple work stealing strategy, not unlike the task-scheduler approach in TBB, so it should be easy to rewrite it with TBB. But one thing I'm not sure how to do in TBB is...

... my task pools tends to grow very fast, so I need a way to monitor the total size of the pools and have the options to either serialize the tasks to disk or send them away over the network. I can do the serialization and network stuff myself, so all I need is some kind of hook function that is called upon task creation and a way to remove tasks from pools. How can I do that? Thanks in advance!

Dmitry_Vyukov · ‎07-05-2010

[cpp]struct my_task : tbb::task
{
  tbb::task* execute()
  {
    if (is_overload())
      serialize_and_offload(this);
    else
      actually_execute();
    return 0;
  }
};

[/cpp]

chentianran · ‎07-05-2010

Thank you for replying.

This may work, but it has two problems:

1) If I understood the scheduling policy correctly, I think this means the offload process will always happen at the highest depth, which is not what I want. I'd prefer to have control as to which tasks are to be offloaded, or at least the option of offloading from the shallow end first.

2) This only allow me to one offload task at a time. The number of tasks in the pools can be huge. I think I need a way to offload a big batch at a time when the number go over a threshold.

Alexey-Kukanov · ‎07-05-2010

There is no direct support for such usage model.

If you use the GPL'dversion of TBB, you are free to implement it by yourself. I would recommend you to use task_scheduler_observer for adding the hook function, and to call the function before task pool resize (look for grow_task_pool() in src/scheduler.cpp). The return value might indicate what the TBB task scheduler should do - either forget about the old tasks and reuse the space, or keep those and grow the pool.

Dmitry_Vyukov · ‎07-05-2010

Then you may consider implementing your own tasking library that satisfies your requirements... Stop, you already have a one. So what do you want? :)

RafSchietekat · ‎07-06-2010

I don't think we have enough information to advise well. Changing TBB yourself means continuous merging effort to track new releases, unless the change is broadly usable and can be integrated (unlikely). But who says that the tasks should be immediately spawn()'ed? Maybe you can keep a data structure of tasks (probably something else than tbb::task) that is accessible for serialisation, and use something like parallel_do() to execute those tasks, if that matches your requirements? But then you have to find a good enough reason to switch, like avoiding oversubscription when you are using TBB features anyway (a good reason!), reusing the effort of a dedicated team of people (unless you are smart and dedicated enough to do your specific job better than a general-purpose library ever could), long-term maintenance requirements (see "dedicated"), ...

chentianran · ‎07-07-2010

I'm definitely not looking to maintain a fork of TBB (and I don't think I'm capable of implementing the feature either). Well, allow me to restate my question: I need 2 features:

1) Monitor the total number of tasks

2) Remove (large number of) tasks from the pool manually

(1) is not as important, as I can have a extra thread keep polling the number of tasks anyway, or simply have a hook called on task spawning. (2) is the important part. Any way I can do that?

I think this can be a useful feature. At least for people doing heavy computation on clusters in which nodes have multiple cores/processors. This approach seems to me quite natural for allowing works balancing/dealing at cluster level while letting TBB to handle work distribution at node level.