Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
Announcements
Intel Customer Support will be observing the Martin Luther King holiday on Monday, Jan. 17, and will return on Tues. Jan. 18.
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646

multicore performance

TC2009074
Beginner
139 Views
any reseach work on bandwidth hungry behaviour of multicore systems ?
0 Kudos
3 Replies
gaston-hillar
Black Belt
139 Views
Quoting - tc2009074
any reseach work on bandwidth hungry behaviour of multicore systems ?

Hi tc2009074,

What kind of specific information are you looking for? A comparison against what?

Alain_D_Intel
Employee
139 Views

If you've bandwidth hungry application ==> bandwidth will drive also your scalability.

Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...

It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you

Cheers.
jimdempseyatthecove
Black Belt
139 Views

If you've bandwidth hungry application ==> bandwidth will drive also your scalability.

Generally speaking, you've a maximum global memory bandwidth for your machine (ie: stream benchmark is a good evaluator of it).
If your application consume xx % of it for 1 thread, you can't expect a scalability greater than 100/xx.
It's often the case when scalability figures have a "plateau" shape after few threads.
To avoid this "ceiling" effect, you should:
- increase your global bandwidth: DIMM,chipset,BIOS settings or machine change (ie)
- modify your algorithm to diminish pressure on memory ==> even if it's slower at 1 core, you know it could be faster after parallization
- reorganizing data layout to be more "cachable" and put less pressure on memory
- etc ...

It's a vast and very interesting subject ==> if you've a description of your application, it could help us to help you

Cheers.

Add to the list to experiment with reorganizing the code layout to bemore "cachable" -code for reduced size. Sometimes unrolling of loops will slow down the code due to spill-out of L1 cache.

Jim Dempsey
Reply