Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

multicore performance

TC2009074
Beginner
849 Views
any reseach work on bandwidth hungry behaviour of multicore systems ?
0 Kudos
3 Replies
gaston-hillar
Valued Contributor I
849 Views
Quoting - tc2009074
any reseach work on bandwidth hungry behaviour of multicore systems ?

Hi tc2009074,

What kind of specific information are you looking for? A comparison against what?

0 Kudos
Alain_D_Intel
Employee
849 Views

If you've bandwidth hungry application ==> bandwidth will drive also your scalability.

Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...

It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you

Cheers.
0 Kudos
jimdempseyatthecove
Honored Contributor III
849 Views

If you've bandwidth hungry application ==> bandwidth will drive also your scalability.

Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...

It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you

Cheers.

Add to the list to experiment with reorganizing the code layout to bemore "cachable" -code for reduced size. Sometimes unrolling of loops will slow down the code due to spill-out of L1 cache.

Jim Dempsey
0 Kudos
Reply