Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

efficiency question in TBB

anine
Beginner
127 Views

When I am testing efficiency of the TBB programs. Carry out the same TBB programs 10,000 times.
The first several efficiency is always worse. what kinds of reason could cause this?

test method :
task_scheduler_init init;

for(int i = 0; i < testcount; i++){
tick_count t0 = tick_count::now();
parallel_for(blocked_range(0, size), ApplyFoo(source,dest,minus), auto_partitioner()); //for auto
tick_count t1 = tick_count::now();
sum = sum + (t1-t0).seconds();
fprintf(fp, " %d : %.12f\n",i, (t1-t0).seconds());
} // end for

where
class ApplyFoo : minus = source - dest;
testcount = 10000 times
size = 640*480

efficency:

1 time : 0.011139 <-------
2 time : 0.001736
3 time : 0.001742
4 time : 0.001273
5 time : 0.001272
.....
average : 0.001284

Thank you.

0 Kudos
1 Reply
robert_jay_gould
Beginner
127 Views
Quoting - anine

When I am testing efficiency of the TBB programs. Carry out the same TBB programs 10,000 times.
The first several efficiency is always worse. what kinds of reason could cause thisThe first time your loading up your cache


This is because during the first attempt your program is loading instructions, cleaning up, getting memory and preparing the cache, and in the following cases everything already in place and its getting optimized by the cache, so it works smoothly.
When you need to compare results of two operations, a good option is to mixup the tests, so they thwart any extra optimizations, that you won't see in an actually complicated program.

Reply