- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am writing an OpenMP program for Q6600 with 4 parallel threads.The threads are not using anyshared data sets, each thread is reading and writing to its own data set.I want to achieve the maximum performance withthe parallel program. I assume that with the 16-way setassociative L2 cache memory in Intel Q6600, the cache is divided into 4096blocks, with 16 cache lines in each block. I read that the main memory is also divided into blocks and every block of the memory is mapped to a block in the L2 cache.I was wondering if there is a more detailed information about this mapping between the cache and the memory.Is it correct to assume that that if two parallel threads are using data in the same block of the main memory and the threads are executing on cores that share aL2 cache, they will share only one block of the cache memory, not the entire 4MB L2 cache?How can I compute the size of theblocks for the main memory?
Thank you,
Svetlana Marinova
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Both cores have access to any cache line in L2, except when the other core takes write ownership. Each core has its own L1 and read combining buffers, and write combining buffers, so the access is indirect, except with "streaming stores." Memory is mapped in 4KB pages, unless huge pages are in use, as Java heap manager may do. DTLB keeps 256 recently used page mappings, so it is possible for the 2 cores to be competing for DTLB as well as L2 cache. If your application is such that each thread needs the entire L2 or DTLB, running 2 threads, each on a separate L2 (KMP_AFFINITY=scatter), may be useful.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page