- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been reading that last dimension of an array, say A(nx,ny) shouldn't be a power of two value because of cache efficiency. So, if I wanted to solve a 1024x1024 problem, what should I do? Create a 1024x1025 array and ignore the last row? Sorry if this is a basic question but I would really like an answer...
best and thanks,
best and thanks,
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem, if there is one, would be a cache mapping problem. or a memory bank conflict.
For cache mapping: If you had a 1-way associative cache, with the matrix rows stored at multiples of the critical size value, only 1 row of the matrix could reside in cache at one time. I once worked with a CPU like that, where all addresses with the same last few bits mapped to the same line in cache. For obvious reasons, CPUs of the last 15 years have been more versatile than that, with at least 4-way associative cache.
The biggest problem of this nature in the last 8 years was the original 32-bit P4, with its 64K aliasing. 2 memory addresses, even if being accessed by different threads on different logicals, would see a cache conflict if they were separated by multiple of 65536 bytes. Some of the more popular OS started thread stacks spaced by exact multiples of this value.
Systems without cache were more often susceptible to memory bank conflicts. Cray 1 set up memory in 17 banks, so as to make it more unlikely that a program could hit the bad intervals accidentally. Needless to say, that detracted from cost effectiveness.
No doubt, your web page would give you some ideas on how to find bad combinations of address intervals and numbers of cache lines.
For cache mapping: If you had a 1-way associative cache, with the matrix rows stored at multiples of the critical size value, only 1 row of the matrix could reside in cache at one time. I once worked with a CPU like that, where all addresses with the same last few bits mapped to the same line in cache. For obvious reasons, CPUs of the last 15 years have been more versatile than that, with at least 4-way associative cache.
The biggest problem of this nature in the last 8 years was the original 32-bit P4, with its 64K aliasing. 2 memory addresses, even if being accessed by different threads on different logicals, would see a cache conflict if they were separated by multiple of 65536 bytes. Some of the more popular OS started thread stacks spaced by exact multiples of this value.
Systems without cache were more often susceptible to memory bank conflicts. Cray 1 set up memory in 17 banks, so as to make it more unlikely that a program could hit the bad intervals accidentally. Needless to say, that detracted from cost effectiveness.
No doubt, your web page would give you some ideas on how to find bad combinations of address intervals and numbers of cache lines.
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - rreis
I've been reading that last dimension of an array, say A(nx,ny) shouldn't be a power of two value because of cache efficiency. So, if I wanted to solve a 1024x1024 problem, what should I do? Create a 1024x1025 array and ignore the last row?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1024 is a power of 2
real :: A(1:1024,1:1024), B(0:1023, 0:1023), C(1024,1024)
A, B and Care 1024 x 1024 arrays
In Fortran the left most index represenst the cells of closest (usually, but not necessarily, adjacent) proximity. There may be some advantage in making the left most index represent an inclusive numeric range that when multiplied by the cell size (4 for REAL(4), 8 for REAL(8), ? for ???) produces a number that is evenly divisible by the cache line size. You may want to experiment as on some processors are not as sensitive to alignment as others.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tim, yes, you're correct, thinking about it it should be the first dimension. I trying to understand this. I've read and listened to people speaking about powers of two being bad for cache. It kind of confused me because memory comes in powers of two and so it would fit nicely into the cache, no? Maybe it fits too well? What I recon is that probably I need to read more about the mechanism behind cache management...
jim, what I've read (or of what I could understand) to have a perfect fit of the line wouldn't be desirable (hence the power of two question). the motive is what I'm trying to understand (for instance, just found out this webpage http://scribblethink.org/Computer/cachekiller.html )...
jim, what I've read (or of what I could understand) to have a perfect fit of the line wouldn't be desirable (hence the power of two question). the motive is what I'm trying to understand (for instance, just found out this webpage http://scribblethink.org/Computer/cachekiller.html )...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem, if there is one, would be a cache mapping problem. or a memory bank conflict.
For cache mapping: If you had a 1-way associative cache, with the matrix rows stored at multiples of the critical size value, only 1 row of the matrix could reside in cache at one time. I once worked with a CPU like that, where all addresses with the same last few bits mapped to the same line in cache. For obvious reasons, CPUs of the last 15 years have been more versatile than that, with at least 4-way associative cache.
The biggest problem of this nature in the last 8 years was the original 32-bit P4, with its 64K aliasing. 2 memory addresses, even if being accessed by different threads on different logicals, would see a cache conflict if they were separated by multiple of 65536 bytes. Some of the more popular OS started thread stacks spaced by exact multiples of this value.
Systems without cache were more often susceptible to memory bank conflicts. Cray 1 set up memory in 17 banks, so as to make it more unlikely that a program could hit the bad intervals accidentally. Needless to say, that detracted from cost effectiveness.
No doubt, your web page would give you some ideas on how to find bad combinations of address intervals and numbers of cache lines.
For cache mapping: If you had a 1-way associative cache, with the matrix rows stored at multiples of the critical size value, only 1 row of the matrix could reside in cache at one time. I once worked with a CPU like that, where all addresses with the same last few bits mapped to the same line in cache. For obvious reasons, CPUs of the last 15 years have been more versatile than that, with at least 4-way associative cache.
The biggest problem of this nature in the last 8 years was the original 32-bit P4, with its 64K aliasing. 2 memory addresses, even if being accessed by different threads on different logicals, would see a cache conflict if they were separated by multiple of 65536 bytes. Some of the more popular OS started thread stacks spaced by exact multiples of this value.
Systems without cache were more often susceptible to memory bank conflicts. Cray 1 set up memory in 17 banks, so as to make it more unlikely that a program could hit the bad intervals accidentally. Needless to say, that detracted from cost effectiveness.
No doubt, your web page would give you some ideas on how to find bad combinations of address intervals and numbers of cache lines.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page