Re: multicore performance

TC2009074 · ‎05-29-2009

any reseach work on bandwidth hungry behaviour of multicore systems ?

gaston-hillar · ‎06-01-2009

Quoting - tc2009074

any reseach work on bandwidth hungry behaviour of multicore systems ?

Hi tc2009074,

What kind of specific information are you looking for? A comparison against what?

Alain_D_Intel · ‎06-11-2009

If you've bandwidth hungry application ==> bandwidth will drive also your scalability.

Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...

It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you

Cheers.

jimdempseyatthecove · ‎06-11-2009

Quoting - Alain Dominguez (Intel)

If you've bandwidth hungry application ==> bandwidth will drive also your scalability.

Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...

It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you

Cheers.

Add to the list to experiment with reorganizing the code layout to bemore "cachable" -code for reduced size. Sometimes unrolling of loops will slow down the code due to spill-out of L1 cache.

Jim Dempsey