Solved: naive question? power two array sizes and cache

rreis · ‎06-02-2009

I've been reading that last dimension of an array, say A(nx,ny) shouldn't be a power of two value because of cache efficiency. So, if I wanted to solve a 1024x1024 problem, what should I do? Create a 1024x1025 array and ignore the last row? Sorry if this is a basic question but I would really like an answer...

best and thanks,

TimP · ‎06-02-2009

The problem, if there is one, would be a cache mapping problem. or a memory bank conflict.
For cache mapping: If you had a 1-way associative cache, with the matrix rows stored at multiples of the critical size value, only 1 row of the matrix could reside in cache at one time. I once worked with a CPU like that, where all addresses with the same last few bits mapped to the same line in cache. For obvious reasons, CPUs of the last 15 years have been more versatile than that, with at least 4-way associative cache.
The biggest problem of this nature in the last 8 years was the original 32-bit P4, with its 64K aliasing. 2 memory addresses, even if being accessed by different threads on different logicals, would see a cache conflict if they were separated by multiple of 65536 bytes. Some of the more popular OS started thread stacks spaced by exact multiples of this value.
Systems without cache were more often susceptible to memory bank conflicts. Cray 1 set up memory in 17 banks, so as to make it more unlikely that a program could hit the bad intervals accidentally. Needless to say, that detracted from cost effectiveness.
No doubt, your web page would give you some ideas on how to find bad combinations of address intervals and numbers of cache lines.

View solution in original post

TimP · ‎06-02-2009

Quoting - rreis

I've been reading that last dimension of an array, say A(nx,ny) shouldn't be a power of two value because of cache efficiency. So, if I wanted to solve a 1024x1024 problem, what should I do? Create a 1024x1025 array and ignore the last row?

The answer depends on factors which you haven't divulged, so you may have to test it yourself for your intended environment, as some of the linpack tests do. You probably mean the last dimension in C, first dimension in Fortran. There's also a good chance that current compilers can make any necessary adjustments automatically, if you don't hide the information from the compiler.

jimdempseyatthecove · ‎06-02-2009

1024 is a power of 2

real :: A(1:1024,1:1024), B(0:1023, 0:1023), C(1024,1024)

A, B and Care 1024 x 1024 arrays

In Fortran the left most index represenst the cells of closest (usually, but not necessarily, adjacent) proximity. There may be some advantage in making the left most index represent an inclusive numeric range that when multiplied by the cell size (4 for REAL(4), 8 for REAL(8), ? for ???) produces a number that is evenly divisible by the cache line size. You may want to experiment as on some processors are not as sensitive to alignment as others.

Jim

rreis · ‎06-02-2009

tim, yes, you're correct, thinking about it it should be the first dimension. I trying to understand this. I've read and listened to people speaking about powers of two being bad for cache. It kind of confused me because memory comes in powers of two and so it would fit nicely into the cache, no? Maybe it fits too well? What I recon is that probably I need to read more about the mechanism behind cache management...

jim, what I've read (or of what I could understand) to have a perfect fit of the line wouldn't be desirable (hence the power of two question). the motive is what I'm trying to understand (for instance, just found out this webpage http://scribblethink.org/Computer/cachekiller.html )...

TimP · ‎06-02-2009

The problem, if there is one, would be a cache mapping problem. or a memory bank conflict.
For cache mapping: If you had a 1-way associative cache, with the matrix rows stored at multiples of the critical size value, only 1 row of the matrix could reside in cache at one time. I once worked with a CPU like that, where all addresses with the same last few bits mapped to the same line in cache. For obvious reasons, CPUs of the last 15 years have been more versatile than that, with at least 4-way associative cache.
The biggest problem of this nature in the last 8 years was the original 32-bit P4, with its 64K aliasing. 2 memory addresses, even if being accessed by different threads on different logicals, would see a cache conflict if they were separated by multiple of 65536 bytes. Some of the more popular OS started thread stacks spaced by exact multiples of this value.
Systems without cache were more often susceptible to memory bank conflicts. Cray 1 set up memory in 17 banks, so as to make it more unlikely that a program could hit the bad intervals accidentally. Needless to say, that detracted from cost effectiveness.
No doubt, your web page would give you some ideas on how to find bad combinations of address intervals and numbers of cache lines.