- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
any reseach work on bandwidth hungry behaviour of multicore systems ?
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tc2009074
any reseach work on bandwidth hungry behaviour of multicore systems ?
Hi tc2009074,
What kind of specific information are you looking for? A comparison against what?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Alain Dominguez (Intel)
If you've bandwidth hungry application ==> bandwidth will drive also your scalability.
Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...
It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you
Cheers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Alain Dominguez (Intel)
If you've bandwidth hungry application ==> bandwidth will drive also your scalability.
Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...
It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you
Cheers.
Add to the list to experiment with reorganizing the code layout to bemore "cachable" -code for reduced size. Sometimes unrolling of loops will slow down the code due to spill-out of L1 cache.
Jim Dempsey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page