- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include
#include
#include
#include
#define MAX 10000000
void mergesort(float a[],int,int);
void merge(float a[],int,int,int);
float a[MAX],b[MAX];
char ar[MAX];
int main()
{
FILE *fp;
int i,n=0,j;
clock_t start,stop;
double startTime,endTime,wt;
int iCPU=omp_get_num_procs();
omp_set_num_threads(iCPU);
printf("\nNo of cpu's=%d\n",iCPU);
printf("Parallel version of Merge sort.\nEnter the data file name :\n");
scanf("%s",ar);
if((fp=fopen(ar,"r"))==NULL)
{
printf("Can't open file.\n");
exit(1);
}
i=fscanf(fp,"%f",&a
while(i != EOF)
{
//printf("%f,",a
n++;
i=fscanf(fp,"%f",&a
}
fclose(fp);
startTime=omp_get_wtime();
start=clock();
mergesort(a,0,n-1);
stop=clock();
endTime=omp_get_wtime();
printf("\nNo of elements=%d\nstart time=%10.9f",n,((float)(start)/ CLOCKS_PER_SEC));
printf("\nstop=%10.9f",((float)(stop)/CLOCKS_PER_SEC));
printf("\nElapsed time=%10.9f",(double)(endTime-startTime));
printf("\nTime taken :%10.9f secs\n",(((float)(stop-start))/CLOCKS_PER_SEC));
}
void mergesort(float a[],int low,int high)
{
int mid;
if (low
mid=(low+high)/2;
#pragma omp parallel
{
#pragma omp sections nowait
{
#pragma omp section
mergesort(a,low,mid);
#pragma omp section
mergesort(a,mid+1,high);
}}
merge(a,low,mid,high);
}
}
void merge(float a[],int low,int mid,int high)
{
int i,h,j,k;
h=i=low;
j= mid + 1;
while((h<=mid) && (j<=high))
if (a
b[i++]=a[h++];
else
b[i++]=a[j++];
if (h>mid)
for(k=j;k<=high;k++)
b[i++]=a
else
for(k=h;k<=mid;k++)
b[i++]=a
for(k=low;k<=high;k++)
a
}
Can anybody pl. tell me why I am not getting scaling with this code. I am using intel C++ (windows) on dual core machine.
Thanx in advance.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think we need some data file to test this. Could you please provide the same? You can attach the data file to this issue.
Thanks,
Om
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may analyze this code using Intel Thread profiler or Intel Parallel Amplifier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may analyze this code using Intel Thread profiler or Intel Parallel Amplifier.
Ya ! as you said I have attached a data file of one lack nos. Just check now
Thanx Om..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you verified your setup for OpenMP is creating threads? (options right, environment variable right etc...)
Hint, place a break in the different sections and see if you enter the section from a different thread (first iteration).
Second, your system has two hardwarethreads, you have a recursive algorithm designed to spawn threads as it nests deeper, assuming you have nested OpenMP levels enabled (on your 2 core system). If you have nested levels disabled, then on second level entry to mergesort goes through the OpenMP parallel then sections only to determine it cannot create a new thread pooland must run serial. This is consuming run time to make this decision.
If you have nested levels enabled, and if OpenMP is creating dozens (or more)of threads, then excessive overhead occures in task switching.
Recursive threading algorithims work well only when you take the effort NOT to schedule threads when additional threads are not available. In OpenMP this is problematic.
A reasonably good solution is to enter a parallel region, obtain the number of threads, split the range into that number of pieces, then each thread takes one piece and then calls your recursive routine (without thread splitting) to perform the sort.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you verified your setup for OpenMP is creating threads? (options right, environment variable right etc...)
Hint, place a break in the different sections and see if you enter the section from a different thread (first iteration).
Second, your system has two hardwarethreads, you have a recursive algorithm designed to spawn threads as it nests deeper, assuming you have nested OpenMP levels enabled (on your 2 core system). If you have nested levels disabled, then on second level entry to mergesort goes through the OpenMP parallel then sections only to determine it cannot create a new thread pooland must run serial. This is consuming run time to make this decision.
If you have nested levels enabled, and if OpenMP is creating dozens (or more)of threads, then excessive overhead occures in task switching.
export LD_PRELOAD=
run
view guide.gvs
It definitely makes threads, each spending as much time as the serial run.
with OMP_NESTED set, it recurses until OMP_STACK is exhausted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
export LD_PRELOAD=
run
view guide.gvs
It definitely makes threads, each spending as much time as the serial run.
with OMP_NESTED set, it recurses until OMP_STACK is exhausted.
Certainly
Consider having a paltry 65,536 number of records. The sort (as written) would split to 16 levels and require 32,768 +/- treads! (each thread, and following generation of threads,performing splits until it has 2 records)
Algo's cannot blindly divide to conquer (as they usuallyend up dividing and fail). The division process has to be conditional based on the availability of hardware threads to perform the process.
The merge sort (as well as a parallel recursive Fibonacci series) are good learning examples. These angorithms can be done in parallel, but only when the code is sensitive to the available resources. BTW recursive algorithms tend to be easy to write but also tend to be much slower than looping iterative methods.
The parallel merge sort will work well with a one time divide N-ways (by number of available HW threads) then perform sort without further division.
When you have a means to determine the availability of additional threads after the N-way split (part way through the sort) then each thread could determine if a split is warranted. e.g. split only when additional thread available .and remaining count is .gt. x.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Certainly
Consider having a paltry 65,536 number of records. The sort (as written) would split to 16 levels and require 32,768 +/- treads! (each thread, and following generation of threads,performing splits until it has 2 records)
Algo's cannot blindly divide to conquer (as they usuallyend up dividing and fail). The division process has to be conditional based on the availability of hardware threads to perform the process.
The merge sort (as well as a parallel recursive Fibonacci series) are good learning examples. These angorithms can be done in parallel, but only when the code is sensitive to the available resources. BTW recursive algorithms tend to be easy to write but also tend to be much slower than looping iterative methods.
The parallel merge sort will work well with a one time divide N-ways (by number of available HW threads) then perform sort without further division.
When you have a means to determine the availability of additional threads after the N-way split (part way through the sort) then each thread could determine if a split is warranted. e.g. split only when additional thread available .and remaining count is .gt. x.
Jim Dempsey
I wil first find available threads and then sort .
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page