Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

Suggestions: hide a pipeline

Sensei_S_
Beginner
159 Views

Dear all, I'm always learning TBB, always with a very simple project:

  • read a text file line by line
  • tokenize each line and put tokens into a container

Right now I've implemented this with a pipeline, where read is a serial filter, while the tokenizer is a parallel one that puts tokens into a parallel_vector. Now I'd like to provide a "parallel tokenizer" that can work with other read/write facilities, not in an explicit pipeline. For instance, use this parallel tokenizer in another software without the user writing his pipeline explicitly.

I am open to suggestions! Is it possible to make a sort of "task" that handles tokenization?

Here's a crazy (failing) idea: a singleton with a queue, and a pipeline with a container started by the constructor, with one parallel filter (essentially the same implementation):

  • another piece of code calls put(string) of the singleton
  • the put(string) simply pushes the string in a parallel_queue
  • the parallel filter retrieves strings from the queue, tokenize it and put it into the container

However, this crazy idea won't work. For one, how can I stop the pipeline? I could use another function that stops the flow_control, but I'm not sure. In essence, I believe this could be done with tasks, but the example for tasks (Fibonacci) isnt' really fitting.

Thanks for any hints you can give me!

Cheers!

0 Kudos
2 Replies
Sensei_S_
Beginner
159 Views

Update: I think it's impossible to do this with a pipeline. I tried but I cannot make it stop at will.

So, do you have any idea how I can make this work? Basically it's something like this:

class magic
{
public:

   magic()
   {
      // magically start the compute function in parallel
   }

   void insert(int i)
   {
      // this function is used by a user to insert values in a queue
      queue.push(i);
   }

   void stop()
   {
      // magically stop the computation
   }

private:

   void compute()
   {
      // this function is magically called in parallel batches
      int i;
      // this is safe with concurrent tasks accessing the queue
      queue.pop(i);
      // do some work with the integer
   }

   tbb::concurrent_bounded_queue<int> queue;
}

Thanks for any suggestion!

RafSchietekat
Black Belt
159 Views

First observation: reading a file may block, but TBB is not aware of that, and so a hardware thread may largely go to waste.

To reuse code, keep it simple: start with a tokenizer that only knows about, e.g., an input string and a vector of output strings, and the user of the tokenizer should then do whatever is necessary to connect things. Don't complicate unless the performance gain to the program as a whole is significant.

 

 

Reply