Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Preventing cache on Intel i7 Sandy Bridge

Isaias_Z_
Beginner
481 Views

We are trying to disable memory cache, by following the Developer manual Intel in chapter 11 http://goo.gl/ufvzA setting CD bit in control register (CR0) when kernel is booting. We have a doubt about our code.  Such code has the effect of slowing down execution. We're using Debian Linux 64-bit. Does it disable the cache of all cores or only one? Does it disable the cache of all levels (L1, L2 and L3 ) or only one? Any idea?

/*This code (Gas sintaxs) was inserted into file main.c on the Linux kernel booting*/

asm("push %rax"); ; save eax

asm("cli");// ; disable interrupts while we do this

asm("movq %cr0, %rax");// ; read CR0

asm("or $0x40000000, %rax");// ; set CD but not NW bit of CR0

asm("movq %rax, %cr0");// ; cache is now disabled

asm("wbinvd"); //flush

asm("or $0x20000000, %rax");// ; now set the NW bit

asm("movq %rax, %cr0"); // ; turn off the cache entirely

asm("pop %rax");// ; restore eax

0 Kudos
9 Replies
Patrick_F_Intel1
Employee
481 Views

Hello Isaias,

Call me chicken but I don't want to click on the URL you've cited.

Can you give a manual name, manual date and section and maybe a page number for your reference?

The less I have to waste time figuring out what you are talking about, the more likely you are to get an answer.

Pat

 

0 Kudos
Isaias_Z_
Beginner
481 Views
  • Thanks for asking, The manual is:

Intel® 64 and IA-32 Architectures
Software Developer’s Manual
Volume 3A:
System Programming Guide, Part 1

In particular chapter 11.


Patrick Fay (Intel) wrote:

Hello Isaias,

Call me chicken but I don't want to click on the URL you've cited.

Can you give a manual name, manual date and section and maybe a page number for your reference?

The less I have to waste time figuring out what you are talking about, the more likely you are to get an answer.

Pat

 

0 Kudos
Patrick_F_Intel1
Employee
481 Views

Thanks,

From table 11-5, It looks to me like setting the CD bit  (and clearing NW) in CR0 doesn't disable any of the caches (that is L1, L2, L3 if present are still present and active).

But, unless a memory address is already in the cache before you before set the CD bit, then all reads and writes will go to memory.

So I would expect to see a massive slowdown. Atom chips are apparently slightly different. Atom (it is not really clear to me) seems to just disable any caching of data in L1/L2/L3 when CD=1.

Is this the same as your understanding and are you seeing a massive slowdown?

Dare I ask why you would want to do this?

Pat

0 Kudos
Bernard
Valued Contributor I
481 Views

Why do you want to disable cache? Do you want to do it in order to make some comparision of cached performance

0 Kudos
Isaias_Z_
Beginner
481 Views

Thank Patric and Iliyapolak
the reason I want to switch off the cache is because I'm testing some udfs from the mkl library in postgres. The docs say the functions are tuned to use the cache to improve performance so I want to see the difference between running with and without cache, disabling it by sw. It is important to me to know exactly which cache are disabled (1 or 8 cores).
In fact, that is true, i get slow down but i want know what happen really? I'm reading the docs and checking the cpu info, etc.

0 Kudos
Patrick_F_Intel1
Employee
481 Views

I think that are thinking that you are disabling L1/L2 and/or L3. You aren't really. You are disabling the use of those caches. The system will behave as if you didn't have L1/L2/L3. Maybe this is a distinction without a difference.

I think that the MKL logic blocks the data that is fetched from memory so that MKL get maximum reuse of the fetched data before the data is bumped out of the cache.

If you ran a memory latency test on the system with a range of sizes (such as a size that should fit into L1 or L2 or L3 (if you have an L3)) then you should see the latency the same as for an array size that is much greater than your last level cache. That is, you should see the latency of what would normally be 'in cache' be as slow as something that is coming from memory.

I'm not sure that disabling cacheing is a very good test of the MKL lib. By disabling ALL cacheing, you are more seeing "how slow can we make chip go". A better test might be to measure how much (or less) memory bandwidth is used before and after the UDFs. (I don't really know what udfs are... I assume they are patches or updates to the library).

Pat

0 Kudos
Bernard
Valued Contributor I
481 Views

Hi Isaias,

what the udfs are?

Regarding your disabling cache experiment I think that at least you will be able to test speculative and out-of-order execution of non-dependent code when your CPU will be waiting for data arrival.

0 Kudos
Isaias_Z_
Beginner
481 Views

Thanks.
Iliyapolak
In SQL databases, a user-defined function provides a mechanism for extending the functionality of the DBMS by adding a function that can be evaluated in SQL statements.

Now i want know the rate of memory bandwidth, follow you good tip, but i can´t any way, probably that is a efect of run the code.
There are tools that get Memory bandwidth, but I believe they read the factory value no the current rate in time execution.

0 Kudos
Isaias_Z_
Beginner
481 Views

Hi again!

Regarding the tests I'm doing, you told me I can  get the " speculative and out-of-order execution of non-dependent code when your CPU will be waiting for data arrival". In other words, this means that the effect of what I'm doing will give the time it more or less takes to run programs in a non efficient order while waiting for data arrival? The tools I found to measure the memory bandwidth rate (e.g. GPU-Z) give a fixed value not a dynamic value. What can I do, as you suggested earlier, to measure the memory bandwidth rate with and without the change of the CD bit in control register (CR0) when kernel is booting?

0 Kudos
Reply