Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

CPU usage not improved

yingemmachen
Beginner
394 Views
Hi,

I used parallel_for in my visual c++ program. When I ran the program, the windows task manager showed that the total threads increased while the CPU usage did not change much (still about 15%).

Do you have any suggestion why this happens? My pc has intel core i7 CPU.

thanks,

Ying
0 Kudos
4 Replies
Dmitry_Vyukov
Valued Contributor I
394 Views
Most likely something wrong with your program.
What exactly? There is a way too much variants to enumerate them all.

0 Kudos
ARCH_R_Intel
Employee
394 Views
You might try cutting your example down to something that you can post as an attachment in this forum, and see if anyone has ideas. Often when I'm cut ting down a problematic example, the root problem dawns on me before I'm even through cutting.
0 Kudos
yingemmachen
Beginner
394 Views
Thanks.

My code is similar to the following. The structure is simple, but the function pParent->Calc is complicated, which calls some comercial library we bought without source code. Any suggestion is welcome and appreciated.


class ApplyCalc{
Parent *pParent;
int index;
double *result;
public:
void operator() ( const blocked_range& r ) const {
for (int j = r.begin(); j != r.end(); ++j) {
result = pParent->Calc(j,index)
}
}
ApplyCalc(Parent *pParent, int index, double *result) :
pParent(p), index(i), result{ }
};

void calcResult(double **allResult) {

for (i=0; i Parent *pParent;
int index;
pParent = getParentFromChildID(pAllChildren->getID(),
pAllParents,
numParents,
&index);
double *result = new double[NUMRUN];
parallel_for(blocked_range(0,NUMRUN),
ApplyCalc(pParent,index,result));
for (j=0; j {
allResult[i,j]=result;
}
delete[] result;
}
}
0 Kudos
Dmitry_Vyukov
Valued Contributor I
394 Views
Try to apply parallel_for to the *outer* loop.
You call pParent->Calc() in parallel. Even if it's thread-safe, most likely it's uses mutexes which kills scalability.
Parallelization of outer loops is always preferable. That will also increase granularity and locality.


0 Kudos
Reply