Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Q&A: Maximum FPS: Three Tips for Faster Code

Intel_Software_Netw1
283 Views

These questions were received by Intel Software Network Support about the original article located at http://www.intel.com/cd/ids/developer/asmo-na/eng/19123.htm, followed by the responses received:

Q. I'm abeginner when it comesto caching. Considering a 32-bit address as discussed in this article, an 8KB cache which is 4-way associative and each cache line is 64 bytes, hence 0 - 5 => offset into 64 byte cache line 6 - 10 => index into set, total 32 sets 11 - 31 => must be tags used for a match (cache hit). Please give some information on how exactly a cache
hit occurs. Thanks.

A. A cache hit occurring is both a simple and complex operation. It is an automated part of instruction processing by the CPU. Simply put, for memory read operations, if the cacheline containing the memory requested by a machine instruction exists in the cache, then the data is retrieved from the cache. This is a cache hit. Similarly, for write operations, if the cacheline exists in the cache then the memory store is done to the cached version of the memory.

The actual implementation of a cache is more complex, and varies from processor to processor in its details. For memory read operations, when a machine instruction requires data from memory, it begins a memory fetch, which first checks the address of the needed memory against the tag arrays of the ways. The tag arrays store the upper address bits of the cached line for each line in each way.

Pseudo code for that would look like this:

// Cache parameters

#define CACHE_SIZE 8*1024

#define NUM_WAYS 4

#define CACHELINE_SIZE 64

#define NUM_SETS (CACHE_SIZE / CACHELINE_SIZE / NUM_WAYS)

// Think of a cache as an array structured like this:

struct cache {

uint32 tag[NUM_SETS][NUM_WAYS];

};

bool cache_hit(uint32 address) {

uint32 set = (address / CACHELINE_SIZE) % NUM_SETS;

uint32 tag_bits = address / NUM_SETS / CACHELINE_SIZE;

// In hardware, all of the ways are looked up in parallel,

// and a hit is determined on a tag-match

for (uint32 way = 0; way < NUM_WAYS; way++) {

if (cache->tag[set][way] = = tag_bits) {

return true; // The data is in the cache

}

}

return false; // The data was not found in the cache

}

If there is a match, then the data is fetched from the matching way. For write operations, the same tag check is done, and if there is a match, the write is done to the cached line. There are a bunch of other details here (such are write-through vs. write-back, and cacheline dirty bits) that vary not only from processor to processor, but by configuration of the cache and memory subsystem. Additionally, certain pages may be marked non-cacheable if, for example, they belong to memory that is modified by devices outside the processor.

The Wikipedia article is a good general background. http://en.wikipedia.org/wiki/CPU_cache (with the caveats that one should never take Wikipedia as a primary source, or fully accurate). Other topics that might be of interest include cacheline eviction policies, different types of caches, cache hierarchy, cache coherence on multicore systems, and the use of prefetch instructions for performance.

--------

Gina B.
Intel Software Network Support
http://www.intel.com/software
email: ISN.support@intel.com

Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

0 Kudos
0 Replies
Reply