06-11-2009 05:29 AM
If you've bandwidth hungry application ==> bandwidth will drive also your scalability.
Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...
It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you
06-11-2009 10:16 AM
Add to the list to experiment with reorganizing the code layout to bemore "cachable" -code for reduced size. Sometimes unrolling of loops will slow down the code due to spill-out of L1 cache.