Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

TBB in interactive application

jmuffat
Beginner
648 Views
Hi all,

As this is my first message here, let me start by praising TBB and saying how much I'm happy to have found this package! It has made life a lot easier for me already and I've just only scratched the surface.

I am working on an application that is a bit peculiar in that it does both want to have a fluid user interface and do a lot of background processing. It is a photo viewer/organiser which you can take a look at at http://www.gpuviewer.com, should you wish to know more.

Here is my problem:
- on the front, the app is displaying photos (which requires intensive computations for decompression of JPG/RAW)
- in the background, the app needs to walk your hard drive and do various tasks.

The background work is ideally suited to TBB, splitting work in small tasks is easy and organising them in "for" loops, pipelines and the like won't be a problem.

Now, what (I think) I need to do, when the UI needs a photo decompressed, is to pause the current flow of tasks, run the background decompression job as a task and then resume what was previously going on.

My guess is that I might be able to do this at the TBB scheduler level, somewhere in the logic where a thread decides where to go next when it is done with a task.

Ok, so here are my questions:
- has anybody faced a similar case before?
- does what I write above make any sense ?
- does anybody have any suggestion ?

Thanks in advance
0 Kudos
9 Replies
AJ13
New Contributor I
648 Views
I have faced a similar problem with pausing TBB task execution for running a simulation. If that is truly the functionality you need, you basically can reuse components that I have written already. The constructs are open source, and I'm doing some performance benchmarks now.... although the high-level interface won't change anytime soon.
0 Kudos
Alexey-Kukanov
Employee
648 Views

TBB tasks are non preemptive, i.e. if a task started to execute, it will run to completion before the thread running it can start another task.

The way to set a sort of priority to a TBB task is to change its depth. There is a couple of methods in class task to do that. Increasing task depth makes the task to be taken for execution earlier, but it also makes it (and its descedants) be considered for stealing later than tasks with lesser depth. So it might not work ideally for you.

In the recent developer updates, there is a prototype of the new functionality that allows cancelling execution of a particular group of tasks. Soon we will release an updated version of it that should be more close to the final one. If your background tasks can be organized in a way that their execution can be safely cancelled and then a new invocation of those tasks can resume from the place where stopped, you might consider this technique. In case of more questions about it, please ask in a couple of weeks.

0 Kudos
jmuffat
Beginner
648 Views
I am completely aware of the non-preemtive nature of tasks in TBB and think it is a particularly good approach. I wouldn't like to cancel the currently scheduled ones, as restarting them could end up in a big waste of resource. Especially in the sort of worst case scenario where the user keeps moving around: the app could end up spending more time cancelling and restarting than doing actual work...

I'm starting to see the problem with using depth as priority... I didn't quite get my head around task stealing yet. I guess I should dive deeper into the innards of the scheduler and then come back to this, a couple of weeks may well have passed by then ;)
0 Kudos
jmuffat
Beginner
648 Views
I ended up doing something simpler than I expected so I thought I should mention it here in case someone would be interested or in case there would be hidden problems to it.

I created a "background work" thread that waits on a semaphore (semTodo), runs a tbb::pipeline when there is work to do and loops back to waiting on the semaphore, until app is closed.

When there is something to do, I run a tbb:pipeline with two stages:
- one serial job popping filter
- one parallel job running filter

When the second filter finishes, it signals a second semaphore (semDone)

The first filter actually waits for both semTodo and semDone. Returning the job for the next stage on semTodo and seeing if pipeline is empty on semDone, returning NULL if so (thus closing pipeline). [note: of course, before the pipeline is run, semTodo is signaled again, otherwise we'd lose count of one job]

I have set the pipeline inFlightCount pararameter to a relatively high value (32), so as to minimize time spent in the first stage (when there is a lot to do).

It works fine and exhibits the following:
- when there is nothing to do, the thread sleeps outside of TBB (ie TBB is idle, and totally available to the main app thread)
- when there is work to do, it is spread over all cores, very little time being wasted in the first stage
- any job request sent while pipeline is active will be honoured as soon as possible
- this plays well even if there is just one core

When other threads are using TBB too, I'm having the impression that things get done efficiently and that the contention involved is pretty reasonnable for a case of a UI with background tasks going.

I'm not sure this is the best approach to do background work with TBB, but it proved a good one for me.
0 Kudos
robert-reed
Valued Contributor II
648 Views

It sounds like a perfectly reasonable approach, and similar to ideas I've seen before to activate TBB only when there's work to do to maximize the advantage of spin locks without leaving things spinning when there's no work to do.

Have you collected any data on scalability?

0 Kudos
jmuffat
Beginner
648 Views
I still haven't really found a way to collect truly usable data from my application. I'm in the middle of converting everything to TBB and, until the big structures are all in place, I'm still running a mix of it and my own thread pool... This doesn't help instrumenting...

I reckon that things scale well to 4 cores, as the app felt noticeably more responsive when I did the TBB code above. I hear, it scales less well to 8 cores, and that doesn't surprise me all that much: photos have to come from the disk and that's always going to be my main bottleneck...
0 Kudos
RafSchietekat
Valued Contributor III
648 Views
I'm not sure I understand your approach: doesn't this block a TBB thread much of the time (there is no way to tell TBB that other work can be executed while the initial filter is waiting on a semaphore), or do you initialise task_scheduler_init with default_num_threads()+1 threads and have the pipeline run throughout the lifetime of the task_scheduler_init object? I am rather suspicious of any pipeline that is not about keeping data local to a thread through multiple filters (a serial filter followed by a parallel filter doesn't count), because then you might as well use parallel_while/parallel_do (the former would be more convenient here, so I don't see why it should be deprecated), which wouldn't solve the blocking, of course. What did I miss?

0 Kudos
robert-reed
Valued Contributor II
648 Views

Perhaps I was overlooking some details, but I thought I understood J. Muffat's explanation. At the core is the use of a pair of filters, the serial input filter that gets called when the "background work" thread initiates the pipeline and passes the lump of decompression work along to the parallel filter, probably as a continuation task. While that thread occupies itself with work, the TBB task scheduler dispatches another pool thread to the serial input task. If semTodo indicates the availability of more work, the new thread repeats the process, stacking parallelthreads doing decompression work. If at the dispatch of any input task it should discover nothing waiting at semTodo, it could wait for semDone, or even spin, alternately checking both as suggested in the text above. Checking both would provide a bit of hysteresis for covering bursty incoming work. When both are empty, the current input filter returns NULL and the pipeline shuts down.

There may be some additional details to bulletproof the process. I'd imagine semDone to be more of an atomic reference count than a full blown semaphore, and probably would get incremented by the input filter and decremented by the processing filter. And there may be some additional details to ensure the background work thread doesn't interfere with semTodo while the sequence of input filters are active, but I do not see any holes in the basic process. Perhaps I'm missing something as well?

0 Kudos
RafSchietekat
Valued Contributor III
648 Views
Well, the serial filter spends much of its time waiting, occupying a TBB thread, so... It would be OK if a task could say to TBB: never mind me, I'm slacking, just get another worker thread going until I'm done or tell you otherwise; but the number of worker threads is fixed at task_scheduler_init time, and if you don't have a continuously slacking task to warrant initialising task_scheduler_init with more than default_num_threads() threads, oversubscription occurs. Maybe TBB could analyse the behaviour of a worker thread (long-running execute, low CPU usage) to draw its own conclusions, but that is not currently so, I think.

The pipeline should be closed immediately if input is exhausted (the overhead would be less significant than sacrificing a physical thread to waiting), notifying the background thread of the fact, so that it would spawn a new pipeline when new input arrives (assuming that it is really something suitable for a pipeline and not a misuse of it, otherwise parallel_while/parallel_do would be more appropriate). In general, spawned tasks should not get the actual work items (they would execute in the opposite order, wouldn't they?), they should continue executing until work is exhausted (to reduce overhead), and if possible multiply according to the work available (I guess parallel_while/parallel_do can take care of that), so far so good, but they should also never wait for more work to arrive (maybe real-time applications would be willing to sacrifice a physical thread for lower latency, I don't know, but it seems like a high price to pay and should be well considered, and I don't even know if this would be a suitable context for TBB?).

0 Kudos
Reply