Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

multicore performance

TC2009074
Beginner
191 Views
any reseach work on bandwidth hungry behaviour of multicore systems ?
0 Kudos
3 Replies
gaston-hillar
Black Belt
191 Views
Quoting - tc2009074
any reseach work on bandwidth hungry behaviour of multicore systems ?

Hi tc2009074,

What kind of specific information are you looking for? A comparison against what?

Alain_D_Intel
Employee
191 Views

If you've bandwidth hungry application ==> bandwidth will drive also your scalability.

Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...

It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you

Cheers.
jimdempseyatthecove
Black Belt
191 Views

If you've bandwidth hungry application ==> bandwidth will drive also your scalability.

Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...

It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you

Cheers.

Add to the list to experiment with reorganizing the code layout to bemore "cachable" -code for reduced size. Sometimes unrolling of loops will slow down the code due to spill-out of L1 cache.

Jim Dempsey
Reply