Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Suggestions: hide a pipeline


Dear all, I'm always learning TBB, always with a very simple project:

  • read a text file line by line
  • tokenize each line and put tokens into a container

Right now I've implemented this with a pipeline, where read is a serial filter, while the tokenizer is a parallel one that puts tokens into a parallel_vector. Now I'd like to provide a "parallel tokenizer" that can work with other read/write facilities, not in an explicit pipeline. For instance, use this parallel tokenizer in another software without the user writing his pipeline explicitly.

I am open to suggestions! Is it possible to make a sort of "task" that handles tokenization?

Here's a crazy (failing) idea: a singleton with a queue, and a pipeline with a container started by the constructor, with one parallel filter (essentially the same implementation):

  • another piece of code calls put(string) of the singleton
  • the put(string) simply pushes the string in a parallel_queue
  • the parallel filter retrieves strings from the queue, tokenize it and put it into the container

However, this crazy idea won't work. For one, how can I stop the pipeline? I could use another function that stops the flow_control, but I'm not sure. In essence, I believe this could be done with tasks, but the example for tasks (Fibonacci) isnt' really fitting.

Thanks for any hints you can give me!


0 Kudos
2 Replies

Update: I think it's impossible to do this with a pipeline. I tried but I cannot make it stop at will.

So, do you have any idea how I can make this work? Basically it's something like this:

class magic

      // magically start the compute function in parallel

   void insert(int i)
      // this function is used by a user to insert values in a queue

   void stop()
      // magically stop the computation


   void compute()
      // this function is magically called in parallel batches
      int i;
      // this is safe with concurrent tasks accessing the queue
      // do some work with the integer

   tbb::concurrent_bounded_queue<int> queue;

Thanks for any suggestion!

Black Belt

First observation: reading a file may block, but TBB is not aware of that, and so a hardware thread may largely go to waste.

To reuse code, keep it simple: start with a tokenizer that only knows about, e.g., an input string and a vector of output strings, and the user of the tokenizer should then do whatever is necessary to connect things. Don't complicate unless the performance gain to the program as a whole is significant.