Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Parallel merge sort

stupidquestion
Beginner
182 Views
Hi,
I want to implement the merge sort algorithm using TBB.
[cpp]void mergeSort(float* someArray, int firstEl, int lastEl)
{
	if(firstEl < lastEl)
	{
		int q = (int) floor((firstEl + lastEl) / 2.0);
		mergeSort(someArray, firstEl, q);
		mergeSort(someArray, q + 1, lastEl);
		merge(someArray, firstEl, q, lastEl);
	}
}

void parallelMergeSort(float* someArray, int firstEl, int lastEl)
{
	if(lastEl - firstEl >= 50000)
	{
		int q = (int) floor((firstEl + lastEl) / 2.0);
		parallel_invoke( [=] {mergeSort(someArray, firstEl, q);},
			[=] {mergeSort(someArray, q + 1, lastEl);} );
		merge(someArray, firstEl, q, lastEl);
	}
	else
	{
		mergeSort(someArray, firstEl, lastEl);
	}
}
[/cpp]
But standart serial algorithm is faster than my parallel algorithm. How howcan Iimprove the performance?
And how I can provide efficient implementation for any number of cores and for any array size(line 14).
I apologizefor my English.
0 Kudos
2 Replies
Kirill_R_Intel
Employee
182 Views
In your sample you just divide the whole job into two parts by paralle_invoke. To have the code scalable, you may consider using another parallel mechanism that allow runtime scaling to any number of available cores.

Implementation of the "merge" function is not described here, but standard implementations have several "for" loops. You can change them to tbb::parallel_for. In this case the loops will be executed in parallel and the data will be divided automatically between worker threads. In this case you can remove parallel_invoke.

Another way is using tbb tasks, but maybe a little bit more complicated.

Also what is the reason of comparison with 50000 in line 14? Have you tried to change this value, as well as the whole data array size?
Invoking parallel processing brings some overhead. If amount of processed data is little, the overhead can be more than positive effect and you can see slowdown. It will work efficiently if amount of data is big enough - try to increase it, maybe you'll get improvement.

Regards,
Kirill
Alexey_K_Intel3
Employee
182 Views
In addition to what Kirill suggested, I would also try recursive parallelism; for a first attempt, it can be as simple as replacing mergeSort with parallelMergeSort in the lambda functionspassed to parallel_invoke.

And, the cutoff threshold of 50000 is likely too big. For example, TBB implementation of parallel_sort uses threshold of 500. You may run some experiments but I'd guess that the value of about 1000 should be sufficient, and possibly can even be reduced.
Reply