Re: Problem at parallel_reduce

nuntawat · ‎04-30-2009

When I wrote this statement below and compiled, I got error message said "no overloaded function take 2 arguments"

[cpp]ParallizingHT ht(blocked_list);
parallel_reduce(blocked_range(0, block_list->size()), ht, auto_partitioner());[/cpp]

at line of code 112 in parallel_reduce.h file (located in installation directory). What should I do?

RafSchietekat · ‎04-30-2009

Difficult to say what line 112 is in your version, but my guess is that ParallizingHT doesn't have a splitting constructor (see Tutorial and Reference Manual).

nuntawat · ‎04-30-2009

Quoting - Raf Schietekat

Difficult to say what line 112 is in your version, but my guess is that ParallizingHT doesn't have a splitting constructor (see Tutorial and Reference Manual).

OK, I'll check it. From example of using parallel_reduce shown in page 23 of Intel Threading Building Blocks Tutorial, what is the criteria of which variable should be spritted and be merged in method join (because that example have onyl two variable), and can I sprit and merge pointer? Thanks verymuch.

RafSchietekat · ‎04-30-2009

You can have any number of aggregate values, e.g., both a sum anda count to later compute the average, and these must be initialised to neutral values in the splitting constructor. I don't know what aggregate operation you would perform specifically on a pointer.

nuntawat · ‎05-07-2009

I have another question.When I run theexample shown in page 18-19 of Intel Threading Building Blocks Tutorial, Visual C++ Debug shows the Debug Assertion Failed at line 944 of module tbb_debug.dll. They said OneTimeInitializationsDone thread did not activate a task_scheduler_init object, but I did not change anything of example code. What is a problem?

Alexey-Kukanov · ‎05-08-2009

Quoting - nuntawat

I have another question.When I run theexample shown in page 18-19 of Intel Threading Building Blocks Tutorial, Visual C++ Debug shows the Debug Assertion Failed at line 944 of module tbb_debug.dll. They said OneTimeInitializationsDone thread did not activate a task_scheduler_init object, but I did not change anything of example code. What is a problem?

The code on pages 18-19 did not specify main(). An example of how to do it right could be found on page 10.

nuntawat · ‎05-21-2009

I have another questions. I would like to use queuing_rw_mutex inside class which implemented for parallel (parallel_for, parallel_reduce, etc.). It looks like code snipet shown below:

[cpp]// Including Visual C++ libraries here.
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_reduce.h"
#include "tbb/queuing_rw_mutex.h"

typedef queuing_rw_mutex currentMutex_t;

class Sample
{
private:
  // a lot of private variables are defined here

  list *block_list;

public:
  static currentMutex_t mutex;

  void operator() (const blocked_range& range) const
  {
    // Begin Queuing Mutex Lock
    currentMutex_t::scoped_lock lock(mutex, false);

    for(list::iterator iter = (*block_list).begin(); iter != (*block_list).end(); iter++)
    {
      /* PARALLEL COMPUTING FOR LINE BELOW */
      local_hough_transform(*iter, lock);
    }
  }

  void local_hough_transform(unsigned char **xc, currentMutex_t::scoped_lock lock) const
  {
    // Do something here
  }

  // Constructor, join method are wriiten here
}[/cpp]

Is the initialization of lock object at the beginning of operator method (not inside for look that literate List object followed by block_range<data_type> range) correct?
(If 1. is correct) For example, I have a method that called from statement inside operator method, how do I pass lock object? (In this case, I try to pass it as formal paramenter like codes shown above, but when I compiled, I had got error C2248: 'tbb::internal::no_copy::no_copy' : cannot access private member declared in class 'tbb::internal::no_copy' tbbincludetbbqueuing_rw_mutex.h; 134;

RafSchietekat · ‎05-21-2009

It seems that "mutex" is protecting an instance variable, so shouldn't it also be non-static (otherwise different instances may needlessly block each other, unless there is a real need that I missed)? It probably also should be private, not public. If you intend to protect the list as a whole then "lock" is where it should be.

Then there's the matter of why you would want to pass the lock around. Currently the lock starts out as shared. I presume that during the iteration you may find out that you really need exclusive access, and that you then want to upgrade it. You could do that by passing the lock by reference (it has been protected against accidental copying). But then you should also reorganise the code so that if the upgrade does not happen atomically (which cannot be guaranteed without risking deadlock) your code abandons "iter" (which may be compromised because other code may have changed the list during the gap between shared and exclusive ownership of the mutex) and probably starts the loop from scratch.

adunsmoor · ‎05-21-2009

Quoting - nuntawat

I have another questions. I would like to use queuing_rw_mutex inside class which implemented for parallel (parallel_for, parallel_reduce, etc.).

Be careful using the same class to implement the body for parallel_for and parallel_reduce. The interface looks very similar but how TBB uses them is quite different. You should be aware of that in order to make sure your class is safely callable from multiple threads.

Update: this explanation of parallel_for is incorrect. See the next post (#9) for a better explanation of how parallel_for actually works.

parallel_for takes a single instance of of your class and calls the operator() method on it with different ranges. Depending on available threads you may have the same instance processing different ranges at the same time. Because of that, the operator() method needs to be "reentrant". If you need to write access to any of your instance variables (or global data) then you should use a lock. If you are simply operating on objects in the range then you shouldn't need locks.

parallel_reduce, on the other hand, uses the splitting constructor when it is able to spread work around between threads and join() to coallesce the results back together. So, operator() can safely modify instance variables of your class without a lock.

Alexey-Kukanov · ‎05-21-2009

Quoting - adunsmoor

parallel_for takes a single instance of of your class and calls the operator() method on it with different ranges. Depending on available threads you may have the same instance processing different ranges at the same time. Because of that, the operator() method needs to be "reentrant". If you need to write access to any of your instance variables (or global data) then you should use a lock. If you are simply operating on objects in the range then you shouldn't need locks.

That would be correct for parallel_do (except that it does not operate with ranges), but far from truth for parallel_for, which actually makes another copy of the body each time it splits a range in two. So each leaf task processing a portion of the original iteration space does it with its own instance of the body class; thus no conflicts on changing instance variables in operator()(). But in return, due to multiple copies, the body for parallel_for should better be lightweight (ideally, without a instance at all).

Alexey-Kukanov · ‎05-21-2009

Duplicate post removed.

RafSchietekat · ‎05-21-2009

Note that in #7 I tried to consider #6 as a question purely about queuing_rw_mutex (because I could not make sense of the context), but in doing that I may have blocked out a bit too much and made matters worse regarding the lifetime of "mutex" (maybe it should be a pointer just like block_list?), if my answer was relevant at all. Later responses seem less daunted to consider that context: how about a brief explanation?

Alexey-Kukanov · ‎05-21-2009

I stepped in just to correct the information about parallel_for behavior. I think your (Raf's) answer matches the question best so far. Yes, scoped locking inside function call operator is semantically correct, and if the scoped lock object should be passed down the call chain it can be done by reference. I am also hesitant to go into undescribed area of why such schema is ever necessary, since there is not enough info to propose better schemas.