- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am writing a C program, where each parallel thread executes the following code:
for(i=nStart; i
{
k=A.key-1;
next = new(stack_elem);
next->number=A.number;
next->prev=bstacks
bstacks
}
on a part of the input array A.
#pragma
omp parallel private(i,tid){
#pragma omp sections nowait{
#pragma omp section{
DWORD_PTR mask=(1<<0);
DWORD_PTR prevmask;
prevmask= SetThreadAffinityMask(GetCurrentThread(),mask);
BucketSort( A,0,NMAX/2,bstacks1);
}
#pragma omp section{
DWORD_PTR mask=(1<<2);
DWORD_PTR prevmask;
prevmask=SetThreadAffinityMask(GetCurrentThread(),mask);
BucketSort( A,NMAX/2,NMAX,bstacks2);
}
}
When I measure theexecution time, it turns out that with 4 cores/4 threads, the program is slower than with 2 cores/2 threads. Thebest performance is obtained when I run the code with 2 cores that share a L2 cache.Is there a way to optimize the program when it is run with 4 cores?
Thank you,
Svetlana Marinova
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If each thread is performing
next = new(stack_elem);
for each element then you should be aware that new has a critical section.
Consider the following code (I will let you work out the details for delete)
stack_elem* p_stack_elem = new(stack_elem[nEnd-nStart+1]);
if(!p_stack_elem) FixThis();
for(i=nStart; i{
k=A.key-1;
next = &p_stack_elem[i-nStart];
next->number=A.number;
next->prev=bstacks;
bstacks=next;
}
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
svetlana_m:
for(i=nStart; i
{
k=A.key-1;
next = new(stack_elem);
next->number=A.number;
next->prev=bstacks
; bstacks
=next; }
Also watch out for false-sharing.
In this example false-sharing can occur on variables bstacks1, bstacks2.
If you have something like this:
stack_elem* bstacks1;
stack_elem* bstacks2;
Than separate variables this way:
size_t const arch_cache_line_size = 64;
stack_elem* bstacks1;
char pad [arch_cache_line_size];
stack_elem* bstacks2;
Or better, allocate bstacks1/bstacks2 variables directly on stacks of the threads.
This can have great impact on performance/scalability!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i'm a new beginner..
i did'nt know how to start open mp programming..
i need to do final project about serial algorithm using open mp programming..
can anyone give me any note that i can learn & practise ...
4thumore, i need to know how to write code to determined num of core in my pc..
regards...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i'm a new beginner..
i did'nt know how to start open mp programming..
i need to do final project about serial algorithm using open mp programming..
can anyone give me any note that i can learn & practise ...
4thumore, i need to know how to write code to determined num of core in my pc..
regards...
Better, the textbook by Chapman, Jost, van der Pas "Using OpenMP"
Did you look at my examples http://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors
There are examples in Fortran, C, and C++ (not Intel C++) of use of omp_get_num_procs(), in case you want that method to find number of cores. It's a very simple task, but it exposes bugs in gfortran and icpc. Also shown is how a program can read data from cpuinfo, on OS which present such information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Better, the textbook by Chapman, Jost, van der Pas "Using OpenMP"
Did you look at my examples http://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors
There are examples in Fortran, C, and C++ (not Intel C++) of use of omp_get_num_procs(), in case you want that method to find number of cores. It's a very simple task, but it exposes bugs in gfortran and icpc. Also shown is how a program can read data from cpuinfo, on OS which present such information.
thankz 4 the info..
i used vstudio 2005 but doesn't have intel compiler..coz just intel compiler support open mp..Am i right??
did i need to install intel compiler to running that progm?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i used vstudio 2005 but doesn't have intel compiler..coz just intel compiler support open mp..Am i right??
did i need to install intel compiler to running that progm?
My examples which use Microsoft C++ also require Fortran. You could use a 30 day trial version. VS2005 does support OpenMP, and it worked for me with my examples, although the documented compatibility is between VS2008 and Intel OpenMP. I set up the Windows build to compare performance of Microsoft and Intel C++, using the same Fortran driver code. Perhaps you can look at the C++ code if you are not interested in installing any compiler beyond MSVC. The C99 code won't work with Microsoft compiler unless you modify it to C89 or C++.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page