Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Which strategy is better fit for recent Intel CPUs?

www_q_
Beginner
345 Views

Regarding performance, assuming we get a block of data that will be freqenctly accessed by each threads, and these data are read-only, which means threads wont do anything besides reading the data, then is it benefitial to create one copy of these data (assuming the data there read-only, and the cache capacity is sufficient to accommodate everything invovled here) for each thread or not?

If the freqenently accessed data are shared by all threads (instead of one copy for each thread), wouldnt this increase the chance of these data will get properly cached?

And more specifically,  how recent (e.g. sandy bridge and later) Intel CPUs handle cache-conflict: assuming there are multiple accessing requests issued by multiple threads on the same cache line, is there a significant latency there if one thread try to read a cache line currently being occuied by another reading-thread?

0 Kudos
5 Replies
SergeyKostrov
Valued Contributor II
345 Views
>>...a block of data that will be freqenctly accessed by each threads... I have two questions: How big is a data set? Is it a random or sequential type of access to elements of the data set?
0 Kudos
TimP
Honored Contributor III
345 Views
For read-only data, a single copy in Sandy Bridge L3 cache should be quite effective, not incurring any extra delay when multiple cores on the same CPU get copies into their exclusive caches. For threads running on different CPUs, it's not so clear whether an advantage might be achieved if the data could be copied into RAM local to each CPU; it would probably depend on access patterns. In the case you seem to be describing, where you say cache capacity is sufficient, it seems the answer would be no.
0 Kudos
www_q_
Beginner
345 Views
TimP (Intel) wrote:

For read-only data, a single copy in Sandy Bridge L3 cache should be quite effective, not incurring any extra delay when multiple cores on the same CPU get copies into their exclusive caches. For threads running on different CPUs, it's not so clear whether an advantage might be achieved if the data could be copied into RAM local to each CPU; it would probably depend on access patterns. In the case you seem to be describing, where you say cache capacity is sufficient, it seems the answer would be no.

Many thanks, and another question, do hyper-threading enabled hardwares like Sandy Bridge has 2X the resource of registers or not (e.g. 32 instead of 16 YMM registers claimed)?
0 Kudos
TimP
Honored Contributor III
345 Views
Yes, the HyperThreading has access to it own YMM registers; at least each thread reserves its own registers from those provided to support renaming.
0 Kudos
Bernard
Valued Contributor I
345 Views
>>>Yes, the HyperThreading has access to it own YMM registers; at least each thread reserves its own registers from those provided to support renaming.>>> Is this the case only in the latest Sandy Bridge architecture?
0 Kudos
Reply