topic Re: QuickSort OpenMP Parallel in Intel® Moderncode for Parallel Architectures

QuickSort OpenMP Parallel

zhangzhe65 — Thu, 16 Apr 2009 17:35:02 GMT

why QuickSort OpenMP Parallel
int _tmain(int argc, _TCHAR* argv[])
{

begin=clock();///CLK_TCK;

quicksort(R,0,N-1);

printf("");
for(i=0;i printf("%d ",R);
printf("\n");
end=clock();//CLK_TCK;
printf("time=%lf\n",end-begin);

return 0;
}

int partition(int R[],int l,int h)
{
int i,j,temp;
i=l;j=h;temp=R;
do
{
for(;(R>=temp)&&(i if(i;
for(;(R<=temp)&&(i if(i;
}while(i!=j);
R=temp;
return i;
}

void quicksort(int R[],int s1,int t1)
{
int i;
if(s1 {
i=partition(R,s1,t1);
//#pragma omp parallel sections
{
#pragma omp section
quicksort(R,s1,i-1);
#pragma omp section
quicksort(R,i+1,t1);
}
}
}
why parallel time is slower than serial time ? please tell me whether there are better time funtions than clock() or not ? Thank you

Re: QuickSort OpenMP Parallel

TimP — Thu, 16 Apr 2009 18:57:57 GMT

clock(), on many platforms, adds up time spent in all threads, so the usual objective for threaded parallel is to reduce elapsed time at the cost of some increase in the time measured by clock(). Your OpenMP should make a suitable choice of timing function in omp_get_wtime() (typically, gettimeofday()), or, for finer resolution, you can use rdtsc counter with precautions against comparing different timers on a multiple socket platform.

Re: QuickSort OpenMP Parallel

pvonkaenel — Fri, 17 Apr 2009 10:51:02 GMT

Quoting - tim18

On Windows I tend to use QueryPerformanceCounter() with QueryPerformanceFrequency() for fine resolution timings. RDTSC is nice, but you need to use asm to access it, and as Tim mentioned, you need to make sure all calls to it come from the same socket.

Tim, I have been assuming that multiple cores on a single die share the same RDTSC timer. Do you know if this is correct?

Peter

Re: QuickSort OpenMP Parallel

TimP — Fri, 17 Apr 2009 13:12:30 GMT

Microsoft and Intel C/C++ compilers for x86/x64 provide an intrinsic __rdtsc() so it is not necessariy to know the asm sequences. I don't know if the SSE4.2 option might in the future switch it over to the new equivalent instruction, or whether the old rdtsc will continue indefinitely to give equivalent results. There is a single timer shared by multiple cores on a CPU.
On Intel 64-bit CPUs (from Prescott on) rdtsc actually counts FSB ticks, and on "Intel microarchitecture code-named Nehalem," QPI ticks, with the reported result scaled in accordance with the nominal core speed. For example, the rdtsc rate doesn't speed up under Turbo mode.
Power-on BIOS logic synchronizes the timers on a multiple socket platform, so they shouldn't differ by more than a few FSB ticks, as long as they share the same FSB clock. So, on ordinary dual CPU platforms, the discrepancy introduced by accidental comparison of the 2 different timers would be within the time delay of switching threads.

Re: QuickSort OpenMP Parallel

Thomas_W_Intel — Fri, 17 Apr 2009 19:45:25 GMT

Concerning the question, why the serial version is faster than the parallel, you might need to have a closer look at the overhead. The first time the parallel section is executed, the OpenMP run-time environment starts the thread pool, which needs a considerable amount of time. For time measurements, it is therefore advisable to havea warm-up call to the parallel routine before the time measurement.

Additionally, you should take into account that parallelization is only beneficial for large data sets. If you sort only 100 integers, you are probably better off using a serial algorithm. If you haven't done so already, you should choose a big data set to time your sort routine. Additionally, you shouldswitch tothe serial variant in your recursion, if the data sets become too small.

You should also be aware that the maximal possible speed-up of a parallel quicksort algorithm is limited by the first partitioning, which is serial. Amdahl's law might bite you, especially if you are running on a machine with a lot of cores.

If you are completely stuck, the Intel Thread Profiler can help you in visualizing the thread behavior.

Kind regards
Thomas

Re: QuickSort OpenMP Parallel

pramodblackbird — Wed, 22 Apr 2009 14:56:08 GMT

working on a quick sort parallel prgm with increased efficiency...once completed, i'll put it on ISN,...

Re: QuickSort OpenMP Parallel

teeks99 — Wed, 22 Apr 2009 15:03:14 GMT

When I do timing tests I always try to keep the I/O (printf's in this case) out of the area that is timed. They can take a (relatively) large amount of time, and the time they take can be somewhat arbitrary (based on how full buffers are, the status of the console, etc). It looks like you're actually trying to time the quicksort() call, so just put your timer around that.

Re: QuickSort OpenMP Parallel

robert-reed — Wed, 22 Apr 2009 15:38:35 GMT

Quoting - pramodblackbird

working on a quick sort parallel prgm with increased efficiency...once completed, i'll put it on ISN,...

Are you aware that Intel already provides a parallel quick sort as part of the Open Source/commercial Intel Threading Building Blocks? The parallel_sort function recursively partitions the sort range and then spawns work on those partitions as separate tasks, scheduled in the Intel TBB thread pool.

Re: QuickSort OpenMP Parallel

pramodblackbird — Wed, 22 Apr 2009 17:30:31 GMT

thnx a lot...