topic multicore performance in Intel® Moderncode for Parallel Architectures

multicore performance

TC2009074 — Fri, 29 May 2009 15:52:26 GMT

any reseach work on bandwidth hungry behaviour of multicore systems ?

Re: multicore performance

gaston-hillar — Mon, 01 Jun 2009 20:00:33 GMT

Quoting - tc2009074

any reseach work on bandwidth hungry behaviour of multicore systems ?

Hi tc2009074,

What kind of specific information are you looking for? A comparison against what?

Re: multicore performance

Alain_D_Intel — Thu, 11 Jun 2009 12:29:06 GMT

Quoting - Alain Dominguez (Intel)

If you've bandwidth hungry application ==> bandwidth will drive also your scalability.

Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...

It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you

Cheers.

Re: multicore performance

jimdempseyatthecove — Thu, 11 Jun 2009 17:16:36 GMT

Quoting - Alain Dominguez (Intel)

Add to the list to experiment with reorganizing the code layout to bemore "cachable" -code for reduced size. Sometimes unrolling of loops will slow down the code due to spill-out of L1 cache.

Jim Dempsey