Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Execution with 4 cores is slower than with 2 cores

svetlana_m
Beginner
427 Views

Hello,

I am writing a C program, where each parallel thread executes the following code:

for(i=nStart; i

{

k=A.key-1;

next = new(stack_elem);

next->number=A.number;

next->prev=bstacks;

bstacks=next;


}


on a part of the input array A.

#pragma

omp parallel private(i,tid)

{

#pragma omp sections nowait

{

#pragma omp section

{

DWORD_PTR mask=(1<<0);

DWORD_PTR prevmask;

prevmask= SetThreadAffinityMask(GetCurrentThread(),mask);

BucketSort( A,0,NMAX/2,bstacks1);

}

#pragma omp section

{

DWORD_PTR mask=(1<<2);

DWORD_PTR prevmask;

prevmask=SetThreadAffinityMask(GetCurrentThread(),mask);

BucketSort( A,NMAX/2,NMAX,bstacks2);

}

}

When I measure theexecution time, it turns out that with 4 cores/4 threads, the program is slower than with 2 cores/2 threads. Thebest performance is obtained when I run the code with 2 cores that share a L2 cache.Is there a way to optimize the program when it is run with 4 cores?

Thank you,

Svetlana Marinova

0 Kudos
6 Replies
jimdempseyatthecove
Honored Contributor III
427 Views

If each thread is performing

next = new(stack_elem);

for each element then you should be aware that new has a critical section.

Consider the following code (I will let you work out the details for delete)

stack_elem* p_stack_elem = new(stack_elem[nEnd-nStart+1]);
if(!p_stack_elem) FixThis();
for(i=nStart; i{
k=A.key-1;
next = &p_stack_elem[i-nStart];
next->number=A.number;
next->prev=bstacks;
bstacks=next;

}
Jim Dempsey

					
				
			
			
				
			
			
			
			
			
			
			
		
0 Kudos
Dmitry_Vyukov
Valued Contributor I
427 Views
svetlana_m:

for(i=nStart; i

{

k=A.key-1;

next = new(stack_elem);

next->number=A.number;

next->prev=bstacks;

bstacks=next;

}




Also watch out for false-sharing.
In this example false-sharing can occur on variables bstacks1, bstacks2.
If you have something like this:
stack_elem* bstacks1;
stack_elem* bstacks2;

Than separate variables this way:
size_t const arch_cache_line_size = 64;
stack_elem* bstacks1;
char pad [arch_cache_line_size];
stack_elem* bstacks2;

Or better, allocate bstacks1/bstacks2 variables directly on stacks of the threads.
This can have great impact on performance/scalability!

0 Kudos
kasut_jepun
Beginner
427 Views

i'm a new beginner..

i did'nt know how to start open mp programming..

i need to do final project about serial algorithm using open mp programming..

can anyone give me any note that i can learn & practise ...

4thumore, i need to know how to write code to determined num of core in my pc..

regards...

0 Kudos
TimP
Honored Contributor III
427 Views
Quoting - kasut_jepun

i'm a new beginner..

i did'nt know how to start open mp programming..

i need to do final project about serial algorithm using open mp programming..

can anyone give me any note that i can learn & practise ...

4thumore, i need to know how to write code to determined num of core in my pc..

regards...


http://openmp.org/wp/

Better, the textbook by Chapman, Jost, van der Pas "Using OpenMP"

Did you look at my examples http://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors

There are examples in Fortran, C, and C++ (not Intel C++) of use of omp_get_num_procs(), in case you want that method to find number of cores. It's a very simple task, but it exposes bugs in gfortran and icpc. Also shown is how a program can read data from cpuinfo, on OS which present such information.

0 Kudos
kasut_jepun
Beginner
427 Views
Quoting - tim18


http://openmp.org/wp/

Better, the textbook by Chapman, Jost, van der Pas "Using OpenMP"

Did you look at my examples http://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors

There are examples in Fortran, C, and C++ (not Intel C++) of use of omp_get_num_procs(), in case you want that method to find number of cores. It's a very simple task, but it exposes bugs in gfortran and icpc. Also shown is how a program can read data from cpuinfo, on OS which present such information.


thankz 4 the info..

i used vstudio 2005 but doesn't have intel compiler..coz just intel compiler support open mp..Am i right??

did i need to install intel compiler to running that progm?

0 Kudos
TimP
Honored Contributor III
427 Views
Quoting - kasut_jepun

i used vstudio 2005 but doesn't have intel compiler..coz just intel compiler support open mp..Am i right??

did i need to install intel compiler to running that progm?


My examples which use Microsoft C++ also require Fortran. You could use a 30 day trial version. VS2005 does support OpenMP, and it worked for me with my examples, although the documented compatibility is between VS2008 and Intel OpenMP. I set up the Windows build to compare performance of Microsoft and Intel C++, using the same Fortran driver code. Perhaps you can look at the C++ code if you are not interested in installing any compiler beyond MSVC. The C99 code won't work with Microsoft compiler unless you modify it to C89 or C++.

0 Kudos
Reply