loop speed up

What is the best method to increase speed up for a loop?

I have parallelize loop but speed up is < 1 infact the serial version is better than parallel version.

Someone can do an example where i can see the efficient of parallel_for ?

This is my code:

task_scheduler_init ( 5 ); -> maybe the error is here?
num_thread = 5;
chunk_inner=int((nel+Ndelay)/num_threads); -> maybe the error is here?
parallel_for (blocked_range(0,Ndelay+nel,chunk_inner), First_Loop (i,nel,direct_i,direct_q),simple_partitioner());

Thanks a lot
The first line creates a task_scheduler_init instance and immediately destroys it; maybe TBB forgets the 5 but I'm not sure. Instead, give the object a name (task_scheduler_init is a type, not a function), and no argument (let TBB figure out the number of threads by itself). Or simply omit the line and rely on implicit initialisation.

chunk_inner is a misnomer, because it is used as a grainsize. Letting it depend on number of worker threads is also bad practice (more small tasks leads to more parallel overhead); instead, make it a constant size.
