Regarding performance, assuming we get a block of data that will be freqenctly accessed by each threads, and these data are read-only, which means threads wont do anything besides reading the data, then is it benefitial to create one copy of these data (assuming the data there read-only, and the cache capacity is sufficient to accommodate everything invovled here) for each thread or not?
If the freqenently accessed data are shared by all threads (instead of one copy for each thread), wouldnt this increase the chance of these data will get properly cached?
And more specifically, how recent (e.g. sandy bridge and later) Intel CPUs handle cache-conflict: assuming there are multiple accessing requests issued by multiple threads on the same cache line, is there a significant latency there if one thread try to read a cache line currently being occuied by another reading-thread?
TimP (Intel) wrote:Many thanks, and another question, do hyper-threading enabled hardwares like Sandy Bridge has 2X the resource of registers or not (e.g. 32 instead of 16 YMM registers claimed)?
For read-only data, a single copy in Sandy Bridge L3 cache should be quite effective, not incurring any extra delay when multiple cores on the same CPU get copies into their exclusive caches. For threads running on different CPUs, it's not so clear whether an advantage might be achieved if the data could be copied into RAM local to each CPU; it would probably depend on access patterns. In the case you seem to be describing, where you say cache capacity is sufficient, it seems the answer would be no.