- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are trying to disable memory cache, by following the Developer manual Intel in chapter 11 http://goo.gl/ufvzA setting CD bit in control register (CR0) when kernel is booting. We have a doubt about our code. Such code has the effect of slowing down execution. We're using Debian Linux 64-bit. Does it disable the cache of all cores or only one? Does it disable the cache of all levels (L1, L2 and L3 ) or only one? Any idea?
/*This code (Gas sintaxs) was inserted into file main.c on the Linux kernel booting*/
asm("push %rax"); ; save eax
asm("cli");// ; disable interrupts while we do this
asm("movq %cr0, %rax");// ; read CR0
asm("or $0x40000000, %rax");// ; set CD but not NW bit of CR0
asm("movq %rax, %cr0");// ; cache is now disabled
asm("wbinvd"); //flush
asm("or $0x20000000, %rax");// ; now set the NW bit
asm("movq %rax, %cr0"); // ; turn off the cache entirely
asm("pop %rax");// ; restore eax
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Isaias,
Call me chicken but I don't want to click on the URL you've cited.
Can you give a manual name, manual date and section and maybe a page number for your reference?
The less I have to waste time figuring out what you are talking about, the more likely you are to get an answer.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Thanks for asking, The manual is:
Intel® 64 and IA-32 Architectures
Software Developer’s Manual
Volume 3A:
System Programming Guide, Part 1
In particular chapter 11.
Patrick Fay (Intel) wrote:
Hello Isaias,
Call me chicken but I don't want to click on the URL you've cited.
Can you give a manual name, manual date and section and maybe a page number for your reference?
The less I have to waste time figuring out what you are talking about, the more likely you are to get an answer.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks,
From table 11-5, It looks to me like setting the CD bit (and clearing NW) in CR0 doesn't disable any of the caches (that is L1, L2, L3 if present are still present and active).
But, unless a memory address is already in the cache before you before set the CD bit, then all reads and writes will go to memory.
So I would expect to see a massive slowdown. Atom chips are apparently slightly different. Atom (it is not really clear to me) seems to just disable any caching of data in L1/L2/L3 when CD=1.
Is this the same as your understanding and are you seeing a massive slowdown?
Dare I ask why you would want to do this?
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why do you want to disable cache? Do you want to do it in order to make some comparision of cached performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank Patric and Iliyapolak
the reason I want to switch off the cache is because I'm testing some udfs from the mkl library in postgres. The docs say the functions are tuned to use the cache to improve performance so I want to see the difference between running with and without cache, disabling it by sw. It is important to me to know exactly which cache are disabled (1 or 8 cores).
In fact, that is true, i get slow down but i want know what happen really? I'm reading the docs and checking the cpu info, etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think that are thinking that you are disabling L1/L2 and/or L3. You aren't really. You are disabling the use of those caches. The system will behave as if you didn't have L1/L2/L3. Maybe this is a distinction without a difference.
I think that the MKL logic blocks the data that is fetched from memory so that MKL get maximum reuse of the fetched data before the data is bumped out of the cache.
If you ran a memory latency test on the system with a range of sizes (such as a size that should fit into L1 or L2 or L3 (if you have an L3)) then you should see the latency the same as for an array size that is much greater than your last level cache. That is, you should see the latency of what would normally be 'in cache' be as slow as something that is coming from memory.
I'm not sure that disabling cacheing is a very good test of the MKL lib. By disabling ALL cacheing, you are more seeing "how slow can we make chip go". A better test might be to measure how much (or less) memory bandwidth is used before and after the UDFs. (I don't really know what udfs are... I assume they are patches or updates to the library).
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Isaias,
what the udfs are?
Regarding your disabling cache experiment I think that at least you will be able to test speculative and out-of-order execution of non-dependent code when your CPU will be waiting for data arrival.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
Iliyapolak
In SQL databases, a user-defined function provides a mechanism for extending the functionality of the DBMS by adding a function that can be evaluated in SQL statements.
Now i want know the rate of memory bandwidth, follow you good tip, but i can´t any way, probably that is a efect of run the code.
There are tools that get Memory bandwidth, but I believe they read the factory value no the current rate in time execution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again!
Regarding the tests I'm doing, you told me I can get the " speculative and out-of-order execution of non-dependent code when your CPU will be waiting for data arrival". In other words, this means that the effect of what I'm doing will give the time it more or less takes to run programs in a non efficient order while waiting for data arrival? The tools I found to measure the memory bandwidth rate (e.g. GPU-Z) give a fixed value not a dynamic value. What can I do, as you suggested earlier, to measure the memory bandwidth rate with and without the change of the CD bit in control register (CR0) when kernel is booting?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page