Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Continuation

pierredaye
Beginner
292 Views
Hello,
First of all I am new to tbb so excuse me if my question seems easy.
I have this process that I want to parallelize. Simply, I have a bunch of nodes that represent an odrinary differential equation (so the number of computations made by each node is important). Those node are interconnected and I want to solve all of them for each time step. I found a way to write the problem using a single loop if I can resend a finished computation in the ready pool.
Here is the idea I had:
  1. Create a dummy parent
  2. The dummy parent generates N children(I can not use the tree architecture for various reasons)
  3. Spawning the tasks
  4. Depending on a double condition (Have I reached the final time? && Am I too fast?), update the internal state of each child and send it back in the ready pool of the parent.
To do so, I used the following implementation:
NetworkClassTask.cpp
#include "NetworkClassTask.hpp"
RootTask::RootTask(int NNode,IntAVector* ResultIN) {
root = allocate_task();
NNODE = NNode;
AllResults=ResultIN;
root->NNODE = NNode;
root->AllResults = AllResults;
}
task* RootTask::execute() {
int count = 1;
for (int ii = 0; ii < NNODE; ii++) {
++count;
listT.push_back(
*new (task::allocate_additional_child_of((*this))) BaseTask(ii,root));
}
set_ref_count(count);
spawn_and_wait_for_all(listT);
return NULL;
}
RootTask* RootTask::allocate_task() {
return scalable_allocator ().allocate(1);
}
BaseTask::BaseTask(int idx, RootTask* parent) {
daddy = parent;
IND = idx;
sum = 0;
TF = 10;
T = 0;
}
task* BaseTask::execute() {
bool OK = true;
int NNode=daddy->NNODE;
//printf("IND:%d\\n",IND);
int CVal=(*(daddy->AllResults))[IND];
for (int ii=0;ii
{
if (ii!=IND)
if (CVal>(*(daddy->AllResults))[ii])
OK=false;
}
if (OK) {
task_list TMP = daddy->listT;
for (int ii = 0; ii < 10; ii++)
sum += 1;
(*(daddy->AllResults))[IND] = sum;
printf(
"Node IND:%d\\t t:%d\\t tf:%d\\t sum:%d \\tNNODE:%d \\n",
IND, T, TF, sum, NNode);
T++;
if (T < TF)
recycle_as_child_of(*parent());
else
return NULL;
} else {
recycle_as_child_of(*parent());
}
return this;
}
NetworkClassTask.hpp
#ifndef NETWORKCLASSTASK_HPP_
#define NETWORKCLASSTASK_HPP_
#include "tbb/scalable_allocator.h"
#include "tbb/task_scheduler_init.h"
#include "tbb/tick_count.h"
#include "tbb/task.h"
#include "tbb/concurrent_vector.h"
#include
#include
#include
#include
const bool tbbmalloc = true;
const bool stdmalloc = false;
using namespace tbb;
using namespace std;
typedef concurrent_vector > IntAVector;
class RootTask: public task{
public:
int NNODE;
RootTask* root;
IntAVector* AllResults;
RootTask(int NNode,IntAVector* ResultIN);
task* execute();
static RootTask* allocate_task();
task_list listT;
};
class BaseTask: public task{
public:
int IND;
int sum;
int TF;
int T;
RootTask* daddy;
BaseTask(int idx,RootTask* parent);
task* execute();
};
#endif /* NETWORKCLASSTASK_HPP_ */
TestNetworkTask.cpp
#include "TestNetworkTask.hpp"
#include "tbb/task_scheduler_init.h"
int main (){
int NNode=20;
task_scheduler_init my_tbb;
IntAVector AllResults;
AllResults.reserve(NNode);
for (int ii=0;ii
AllResults[ii]=0;
task& my_root=*new(task::allocate_root()) RootTask(NNode,&AllResults);
my_root.execute();
return 0;
}
TestNetworkTask.hpp
#ifndef TESTNETWORKTASK_HPP_
#define TESTNETWORKTASK_HPP_
#include "NetworkClassTask.hpp"
#endif /* TESTNETWORKTASK_HPP_ */
The compilation went fine and the program is running if NNODE in TestNetworkTask.cpp is smaller than my number of processors. But if I have more nodes that the number of CPUs the program hangs when it reaches the number of CPUs...
Can someone help me to find the mistake?
Thanks
Pierre
0 Kudos
0 Replies
Reply