TBB for Qt CD ripper?

jvoegele · ‎07-11-2009

I am building a CD ripper application in C++ and Qt. I would like to parallelize the application such that multiple tracks can be encoded concurrently. Therefore, I have structured the application in such a way that encoding a track is a "Task", and I'm working on a mechanism to run some number of these Tasks concurrently. I could, of course, accomplish this using threads and write my own Task queue or work manager, but I thought TBB might be a better tool for the job. I have a couple of questions, however.

Is encoding a WAV file into a FLAC, Ogg Vorbis, or Mp3 file something that would work well as a tbb::task? The tutorial document states that "if threads block frequently, there is a performance loss when using the task scheduler because". I don't think my encoding tasks would block for mutexes frequently, but the will need to access disk relatively frequently, since they must read the WAV data from disk in order to encode. Is this level of disk activity problematic in the sense described by the tutorial?
Does TBB work well with Qt? When using Qt threads, you can use Qt's signals/slots mechanism transparently across threads. Would the same be true if I were using tbb::tasks instead of Qt threads? Would there be any other "gotchas"?

Thanks for any insights you can provide.

AJ13 · ‎07-13-2009

Quoting - jvoegele

I am building a CD ripper application in C++ and Qt. I would like to parallelize the application such that multiple tracks can be encoded concurrently. Therefore, I have structured the application in such a way that encoding a track is a "Task", and I'm working on a mechanism to run some number of these Tasks concurrently. I could, of course, accomplish this using threads and write my own Task queue or work manager, but I thought TBB might be a better tool for the job. I have a couple of questions, however.

Is encoding a WAV file into a FLAC, Ogg Vorbis, or Mp3 file something that would work well as a tbb::task? The tutorial document states that "if threads block frequently, there is a performance loss when using the task scheduler because". I don't think my encoding tasks would block for mutexes frequently, but the will need to access disk relatively frequently, since they must read the WAV data from disk in order to encode. Is this level of disk activity problematic in the sense described by the tutorial?

Does TBB work well with Qt? When using Qt threads, you can use Qt's signals/slots mechanism transparently across threads. Would the same be true if I were using tbb::tasks instead of Qt threads? Would there be any other "gotchas"?

Thanks for any insights you can provide.

Hi,

I'm happy to see that someone has started to integrate Qt applications with TBB. Hopefully your application is open source too, at which point I'll be very very happy. I'll try to help you as much as I can.

Let's start with your first question regarding encoding as a tbb::task. First, you are right that blocking a TBB task is a no-no. tbb::tasks are there for computational work, which in your case will be computation for the encoding. I'm not sure how hard encoding is, computationally-speaking. The tasks should represent fairly small units of computation for load balancing, the smaller the units are the better the load balancing will work.

I'll try to illustrate this with an example. You described encoding an entire track as a tbb::task. Let's say you have 4 tracks, and 4 processors. Suppose that 3 of these tracks are fairly short, and finish quickly. The scheduling algorithm cannot break up the work any further, so you will have 3 idle cores while a single core bears all the computational work. Sometimes you will get a speed-up, sometimes not. Now, suppose that instead you break up work to represent encoding say two seconds of track (I'm not sure how encoding works, so forgive me if this isn't right). Each tbb::task would include an id for the 2 second segment, say the track number it belongs to and the offset of the 2 seconds into the track. Now you break up encoding of tracks into many many 2s segments... this eliminates the former problem that occurs when one track is longer than others.

Now we have the I/O problem. You will have to determine if your problem is I/O bound, or computationally bound. The tbb::pipeline seems like it could help you with this problem. You could setup a pipeline where you read a segment of memory, encode it, then write it back to disk.

Finally, Qt threads. What are you using the Qt threads for? I am assuming this is for the graphical interface only? It would likely be best to just link your encoding module separately, so that it can manage its own threading via TBB. That is to say, write your encoder as a separate module in C++ with TBB, and define an external inteface to the module so that threading is entirely hidden. Then you can call this module from your Qt application so that it does not know it is even calling a threaded program. Technically this could lead to oversubscription, which happens when there are more threads than processors. Whether or not this is a problem depends entirely upon how and why you are using threads.

I hope that this is a satisfactory response. Currently I am traveling in Europe. I should have access to the Internet for the next few days, and after that I will disappear until I get back to Canada on the 27th.

AJ

robert_jay_gould · ‎07-13-2009

You might have seen it already, but a few weeks back there was a thread about using TBB together with QT, although not specific to your encoding issues, it might help you with QT.

http://software.intel.com/en-us/forums/showthread.php?t=66046

jvoegele · ‎07-14-2009

Quoting - AJ

I'm happy to see that someone has started to integrate Qt applications with TBB. Hopefully your application is open source too, at which point I'll be very very happy. I'll try to help you as much as I can.

Let's start with your first question regarding encoding as a tbb::task. First, you are right that blocking a TBB task is a no-no. tbb::tasks are there for computational work, which in your case will be computation for the encoding. I'm not sure how hard encoding is, computationally-speaking. The tasks should represent fairly small units of computation for load balancing, the smaller the units are the better the load balancing will work.

I'll try to illustrate this with an example. You described encoding an entire track as a tbb::task. Let's say you have 4 tracks, and 4 processors. Suppose that 3 of these tracks are fairly short, and finish quickly. The scheduling algorithm cannot break up the work any further, so you will have 3 idle cores while a single core bears all the computational work. Sometimes you will get a speed-up, sometimes not. Now, suppose that instead you break up work to represent encoding say two seconds of track (I'm not sure how encoding works, so forgive me if this isn't right). Each tbb::task would include an id for the 2 second segment, say the track number it belongs to and the offset of the 2 seconds into the track. Now you break up encoding of tracks into many many 2s segments... this eliminates the former problem that occurs when one track is longer than others.

Now we have the I/O problem. You will have to determine if your problem is I/O bound, or computationally bound. The tbb::pipeline seems like it could help you with this problem. You could setup a pipeline where you read a segment of memory, encode it, then write it back to disk.

Finally, Qt threads. What are you using the Qt threads for? I am assuming this is for the graphical interface only? It would likely be best to just link your encoding module separately, so that it can manage its own threading via TBB. That is to say, write your encoder as a separate module in C++ with TBB, and define an external inteface to the module so that threading is entirely hidden. Then you can call this module from your Qt application so that it does not know it is even calling a threaded program. Technically this could lead to oversubscription, which happens when there are more threads than processors. Whether or not this is a problem depends entirely upon how and why you are using threads.

AJ - thank you for the very helpful and informative response. I'll try to answer all of your questions in order.

First, yes my application is open source, but is in the very early stages and I haven't published it in any public source repository. I'll try to whip it into shape a little bit and push it up to github or somewhere.

Next, you are correct that my application design has tasks that are far too coarsely grained. I should have fine-grained tasks to best take advantage of TBB. I initially chose the coarse-grained design for two reasons:

I was thinking more in terms of application level tasks, or "units of work" as the user would understand them. At this level, tasks are complete units of work that can be started, paused, and cancelled, and most importantly report status and progress back to the GUI thread so that the user can see how tasks are progressing. At this level you can also have dependencies and priorities for tasks, but it is a very different thing than the more fine-grained tasks defined by TBB.
I am using external libraries to do the actual encoding operations, e.g. libvorbis, libflac, etc.. These libraries tend to have high level interfaces with operations such as "encode this WAV file using the given bitrate, etc.". With my current application design, I don't really have the flexibility to break down the tasks into smaller units because the API doesn't expose such lower level operations.

So that explains why I've chosen to design the application the way that I have. However, I am not tied to this design and would be happy to redesign some portions in order to work better with TBB. That may mean that I have to actually go into the code for the encoding libraries (which are all open-source, fortunately) and hack them a bit to get them to work with TBB. I'd prefer not to have to do this if I could avoid it, but I'm not sure that I can. Any thoughts on that?

As for the I/O problem, my experience indicates that encoding a WAV file into FLAC, Ogg Vorbis, or Mp3 is mostly CPU bound, but there is still a good deal of I/O. The encoders will read a chunk of data from the rather largish WAV file and then do a fairly large amount of computation on the chunk of data. I do not know the innards of these encoders well enough to provide any more detail, but having used them it appears they spend more time on CPU than waiting on disk I/O.

Finally, Qt threads. I am currently using them only because I had started writing my own version of tbb::task before I discovered TBB. If I can get TBB working for this application I don't think I'd need to use Qt threads at all. However, one very important feature of Qt threads is that Qt's signals/slots mechanism works transparently across threads. If a signal that is emitted in a worker thread is connect to a slot in, say, the GUI thread, Qt transparently turns the signal into an event and places it into the GUI thread's event queue. I'm not sure if this support requires use of Qt threads, or if Qt is smart enough to determine that there are separate threads involved even if some other threading library is being used.

Thanks again for your help. I really think TBB would be a great boon to my application if I can overcome these hurdles.

jvoegele · ‎07-14-2009

Quoting - robert.jay.gould

You might have seen it already, but a few weeks back there was a thread about using TBB together with QT, although not specific to your encoding issues, it might help you with QT.

http://software.intel.com/en-us/forums/showthread.php?t=66046

Thanks for pointing this out. It looks like there's some very good information in that thread. I'll need to read it through a couple of times to see if I can apply any of that information to my own circumstances. :)

jvoegele · ‎07-17-2009

I've been thinking about this a bit, and I think the biggest concern I have now is how to report progress back to the GUI. I envision the GUI for my application as displaying a progress bar for each track that is being (or going to be) encoded. If I break up the encoding of each track into multiple tasks (either explicitly or by using something like a pipeline), I don't know how I can update the progress bar for each track. How does one handle progress reporting with TBB? Is there a safe way to do callbacks or post events to the GUI thread in any way?

AJ13 · ‎07-28-2009

Just as an FYI I'm back from vacation, you can reach me as Hydrant on IRC, on #tbb at irc.freenode.net.

So far as progress reporting, the best thing, I think, is to update several atomic variables (as infrequently as possible, otherwise you'll create a bottleneck). You don't want any of the worker threads to block, so calling a call-back that blocks to update a GUI would be bad, if it's a potentially blocking or very long call. However, with Qt, I don't think many things would block because the signal-slot mechanism is handled in a special way iirc. I think you'd just emit a signal, and AFAIK the signals are handled by a separate thread...I think.

AJ

robert-reed · ‎07-30-2009

This may seem strange coming from a TBB advocate, Jason, but what does TBB add to the mix in your application? If all you are interested in is tbb::task, which is an escape for TBB applications that need some means for some portion of their operation to run a blocking or asynchronous task, not the primary feature of TBB, perhaps you should just stick with Qt.

A more fundamental question I have, though, regards the idea of multi-threading the suggested libraries, VORBIS and FLAC (I don't know either--I've spent a few minutes looking at each of their FAQs). In my quick scan, I found no suggestion whether these libraries are capable of operating in a threaded environment. They might, or they might not be thread safe. If they rely on global variables or have other side effects that preclude rentrancy, then it would make more sense to build your application as a multi-processing code rather than a multi-threading code (ie, spawn separate processes for separate clips). You would have to deal with aforementioned load balance issues, which may be a problem if your clips are varied in length and few in number, but if there's a lot of them, you can schedule clips to keep HW threads busy. There is the progress resporting issue, which would be compounded by using multiple processes and requiring multi-process shared memory or MPI protocol to solve the communications issues.

I second AJ's comments about progress update frequency. The question I would ask is: how much resolution do you need in your progress bar? Is four segments enough? Or does your application need 100? Progress reporting is a balance of forces between observability and freedom from interruption: the more frequently your codereports, the more disruption occurs in the task being tracked. I've seen several server applications whose principal bottleneck was their log reporting data flow. If you're able to do this on a multi-threaded span, the issue will be one of cache line thrashing. You'd like to write progress reports as soon as they're available and not have to wait to get back to the task. If your thread owns the cache line, it can repeatedly and quickly overwrite it as needed (particularly important if the quanta upon which you're reporting progress vary in size). However, at some point your master thread is going to want to know status and its issue of a read request will cause a snoop that will force the task HW thread to flush the modified cache line to memory, which will eat into its memory bandwidth. So infrequent reads and private cache line writes by each worker thread could be a workable scheme.