int _tmain(int argc, _TCHAR* argv)
int partition(int R,int l,int h)
void quicksort(int R,int s1,int t1)
//#pragma omp parallel sections
#pragma omp section
#pragma omp section
why parallel time is slower than serial time ? please tell me whether there are better time funtions than clock() or not ? Thank you
On Windows I tend to use QueryPerformanceCounter() with QueryPerformanceFrequency() for fine resolution timings. RDTSC is nice, but you need to use asm to access it, and as Tim mentioned, you need to make sure all calls to it come from the same socket.
Tim, I have been assuming that multiple cores on a single die share the same RDTSC timer. Do you know if this is correct?
On Intel 64-bit CPUs (from Prescott on) rdtsc actually counts FSB ticks, and on "Intel microarchitecture code-named Nehalem," QPI ticks, with the reported result scaled in accordance with the nominal core speed. For example, the rdtsc rate doesn't speed up under Turbo mode.
Power-on BIOS logic synchronizes the timers on a multiple socket platform, so they shouldn't differ by more than a few FSB ticks, as long as they share the same FSB clock. So, on ordinary dual CPU platforms, the discrepancy introduced by accidental comparison of the 2 different timers would be within the time delay of switching threads.
Concerning the question, why the serial version is faster than the parallel, you might need to have a closer look at the overhead. The first time the parallel section is executed, the OpenMP run-time environment starts the thread pool, which needs a considerable amount of time. For time measurements, it is therefore advisable to havea warm-up call to the parallel routine before the time measurement.
Additionally, you should take into account that parallelization is only beneficial for large data sets. If you sort only 100 integers, you are probably better off using a serial algorithm. If you haven't done so already, you should choose a big data set to time your sort routine. Additionally, you shouldswitch tothe serial variant in your recursion, if the data sets become too small.
You should also be aware that the maximal possible speed-up of a parallel quicksort algorithm is limited by the first partitioning, which is serial. Amdahl's law might bite you, especially if you are running on a machine with a lot of cores.
If you are completely stuck, the Intel Thread Profiler can help you in visualizing the thread behavior.
Are you aware that Intel already provides a parallel quick sort as part of the Open Source/commercial Intel Threading Building Blocks? The parallel_sort function recursively partitions the sort range and then spawns work on those partitions as separate tasks, scheduled in the Intel TBB thread pool.