Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Does anyone knows why L1D cache miss rate is high?

liu__kevin
Beginner
1,014 Views

Hi,

I am testing i7-6700 with Vtune 2016 with SPEC 2006. From the definitions, I believe the following should satisfy:

MEM_INST_RETIRED.ALL_LOADS = MEM_LOAD_RETIRED.L1_HIT + MEM_LOAD_RETIRED.L1_MISS

MEM_INST_RETIRED.ALL_LOADS : All retired load instructions.

MEM_LOAD_RETIRED.L1_HIT : Retired load instructions with L1 cache hits as data sources.

MEM_LOAD_RETIRED.L1_MISS : Retired load instructions missed L1 cache as data sources

 

However, for some programs (LIBQUANTUM and MCF) where L1D cache miss rate is high, the three numbers are listed as follows.

(MEM_INST_RETIRED.ALL_LOADS,   MEM_LOAD_RETIRED.L1_HIT,   MEM_LOAD_RETIRED.L1_MISS )

LIBQUANTUM : 2.47E+10, 1.40E+10, 2.67E+09, respectively.

MCF  1.15E+11, 7.57E+10, 2.43E+10, respectively.

 

You can see there are gaps for these programs. Does anyone knows the reason? 

 

Thank you.
    

0 Kudos
3 Replies
Alexandra_S_Intel
1,014 Views

Hello, Kevin,

I'm copying and pasting my reply from the other thread you posted this question on. I'd like to remind you to please not post multiple copies of a question. If it is a follow up question to one thread, please post it only in that thread (do not create a copy of it to post in a new thread). If it is an unrelated question, create a new thread (but do not post it on an old thread). We won't forget about you! If you're not getting a response, we're probably just busy!

 

It seems likely that the missing factor here is MEM_LOAD_RETIRED.FB_HIT.
Sometimes loads miss L1 but hit FB due to a preceding miss in the same cache line. As I understand it, these are not counted in the MEM_LOAD_RETIRED.L1_MISS event counter. Instead they are recorded in MEM_LOAD_RETIRED.FB_HIT.

So your equation should look like so, approximately:
MEM_INST_RETIRED.ALL_LOADS = MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.L1_HIT + MEM_LOAD_RETIRED.FB_HIT

That said, this will still not be exact. You can expect a small, statistically insignificant difference in results, due to the way event counting works.
Multiplexing and sampling, among other things, can cause you to miss a few events. It's a complicated topic but basically, if we just took note of every event that occurred, it would slow everything down to the point that collecting useful data would be impossible. Instead we have a few hardware counters that count a certain number of events, and every once in a while we take note of how many events occurred in that time block. Making things even more complicated, there are only a few hardware counters, so we have to have them switch out between different events to count. You can get more information here: https://software.intel.com/en-us/articles/understanding-how-general-exploration-works-in-intel-vtune-amplifier-xe
There are numerous other things that can cause very small differences, and I won't list them all - I don't even claim to know them all, because there are a lot of them, and as I said, they produce such small, insignificant mismatches that they can be more or less ignored.

Does this answer your question?

0 Kudos
liu__kevin
Beginner
1,014 Views

Alexandra S. (Intel) wrote:

Hello, Kevin,

I'm copying and pasting my reply from the other thread you posted this question on. I'd like to remind you to please not post multiple copies of a question. If it is a follow up question to one thread, please post it only in that thread (do not create a copy of it to post in a new thread). If it is an unrelated question, create a new thread (but do not post it on an old thread). We won't forget about you! If you're not getting a response, we're probably just busy!

 

It seems likely that the missing factor here is MEM_LOAD_RETIRED.FB_HIT.
Sometimes loads miss L1 but hit FB due to a preceding miss in the same cache line. As I understand it, these are not counted in the MEM_LOAD_RETIRED.L1_MISS event counter. Instead they are recorded in MEM_LOAD_RETIRED.FB_HIT.

So your equation should look like so, approximately:
MEM_INST_RETIRED.ALL_LOADS = MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.L1_HIT + MEM_LOAD_RETIRED.FB_HIT

That said, this will still not be exact. You can expect a small, statistically insignificant difference in results, due to the way event counting works.
Multiplexing and sampling, among other things, can cause you to miss a few events. It's a complicated topic but basically, if we just took note of every event that occurred, it would slow everything down to the point that collecting useful data would be impossible. Instead we have a few hardware counters that count a certain number of events, and every once in a while we take note of how many events occurred in that time block. Making things even more complicated, there are only a few hardware counters, so we have to have them switch out between different events to count. You can get more information here: https://software.intel.com/en-us/articles/understanding-how-general-exploration-works-in-intel-vtune-amplifier-xe
There are numerous other things that can cause very small differences, and I won't list them all - I don't even claim to know them all, because there are a lot of them, and as I said, they produce such small, insignificant mismatches that they can be more or less ignored.

Does this answer your question?

 

Hi Alexandra,

Thank you so much for your reminding.

Have a great weekend.

0 Kudos
liu__kevin
Beginner
1,014 Views

Alexandra S. (Intel) wrote:

Hello, Kevin,

I'm copying and pasting my reply from the other thread you posted this question on. I'd like to remind you to please not post multiple copies of a question. If it is a follow up question to one thread, please post it only in that thread (do not create a copy of it to post in a new thread). If it is an unrelated question, create a new thread (but do not post it on an old thread). We won't forget about you! If you're not getting a response, we're probably just busy!

 

It seems likely that the missing factor here is MEM_LOAD_RETIRED.FB_HIT.
Sometimes loads miss L1 but hit FB due to a preceding miss in the same cache line. As I understand it, these are not counted in the MEM_LOAD_RETIRED.L1_MISS event counter. Instead they are recorded in MEM_LOAD_RETIRED.FB_HIT.

So your equation should look like so, approximately:
MEM_INST_RETIRED.ALL_LOADS = MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.L1_HIT + MEM_LOAD_RETIRED.FB_HIT

That said, this will still not be exact. You can expect a small, statistically insignificant difference in results, due to the way event counting works.
Multiplexing and sampling, among other things, can cause you to miss a few events. It's a complicated topic but basically, if we just took note of every event that occurred, it would slow everything down to the point that collecting useful data would be impossible. Instead we have a few hardware counters that count a certain number of events, and every once in a while we take note of how many events occurred in that time block. Making things even more complicated, there are only a few hardware counters, so we have to have them switch out between different events to count. You can get more information here: https://software.intel.com/en-us/articles/understanding-how-general-exploration-works-in-intel-vtune-amplifier-xe
There are numerous other things that can cause very small differences, and I won't list them all - I don't even claim to know them all, because there are a lot of them, and as I said, they produce such small, insignificant mismatches that they can be more or less ignored.

Does this answer your question?

 

Hi Alexandra,

Can you help me with this question?

https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe/topic/700685

Thank you.

0 Kudos
Reply