- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, I am starting to use functions like _mm_clflush, _mm_clflushopt, and _mm_clwb.
Say now like I have defined a struct variable called 'mystruct' and its size is 512bytes.
If I want to flush the cache line containing the address of 'mystruct', which way is the right way to flush:
_mm_clflush(&mystruct)
or
for (int i = 0; i < sizeof(mystruct)/64; i++) {
_mm_clflush( ((char *)&mystruct) + i)
}
Anybody can tell me which is the right way to flush?
Many thanks for the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These intrinsics accept a single address and perform the requested operation on the cache line containing that address. So you will need 8 or 9 CLFLUSH operations for a 512 Byte structure (depending on its alignment).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I realize my above coding has a mistake (the second setting).
It should be amended as follow (I would like to flush every cache line that the address of mystruct consists of):
for (int i = 0; i < sizeof(mystruct)/64; i++) {
_mm_clflush( ((char *)&mystruct) + i*64)
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These intrinsics accept a single address and perform the requested operation on the cache line containing that address. So you will need 8 or 9 CLFLUSH operations for a 512 Byte structure (depending on its alignment).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the struct is not 64-Byte-aligned, this code will not flush the cache line containing the final partial cache line of the struct.
There are several approaches to implementing the more general code -- I almost always have to re-create and test these from scratch to make sure I got the logic right.
I *think* that all you need to do is add:
if ( &mystruct%64 != 0 ) {
_mm_clflush( ((char *)&mystruct) + 511);
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks John.
So, in order to make coding easier, just add this code snippet after the flushing loop for every 64B and let it decides whether an additional cache line flush is needed for the struct.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This extra code is not always required -- it depends on the specific combination of the length of the struct and its alignment relative to cache line boundaries. There are other ways to structure the logic -- you could take the floor of the starting address/64Bytes and the ceiling of the ending address/64Bytes and use those as the loop bounds.
CLFLUSH is a relatively low-overhead operation, so flushing the highest address in the structure every time (rather than working through the logic) won't have a noticeable performance impact.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page