How many bytes of instructions removed from LI Cache

Steven_P_1 · ‎10-08-2015

Hi, I have an assignment for my computer architecture class.

I have read intel's Architecture Optimization Manual and still cannot find the answer.I need to know how many bytes of Instructions are removed from the L1 instruction cache.

Say I run a FFT sequence/pr, it is of say X bytes.

1.Are X bytes, cached as whole in the L1 cache or it is partially cached?

2. Are the X bytes fully(wholly) transmitted to the L2 cache?

3. Does the same procedure 1 and 2, hold, when FFT program/Sequence is removed?

Frances_R_Intel · ‎10-08-2015

This is not really an appropriate forum for questions like this and your questions are not really well formed. But the quick answer to questions 1 and 2 (and assuming that in question 2, you mean transmitted from memory to L2) is "it depends". I don't know what book you are using for your class but my favorite is Hennessy and Patterson. Cache size, cache replacement policy, the size of your code section, the presence of any instruction jumps and, for the L2 question, data usage are all going to affect whether a particular section of code is in L1 at any point in your run.

Steven_P_1 · ‎10-08-2015

I did some further reading in the manual. It says that, 2 micro-ops are macro-fused to form one micro op. It further says that the macro fused instruction is stored as single micro op in the Decoded L1 Instruction Cache ( which can hold up to 1520 micro ops). For example, the manual says that BRANCH and COMPARE instructions are combined to form 1 micro-op.

So the next questions are :-

1. Is the stored micro-op in L1I Cache, transmitted and stored, in the L2 Cache?

2. If that micro-op is stored in the L2 cache, does that micro-op use 1 micro-op for storage in the L2 Cache?

3. Which unit in the Intel architecture, controls the storage, removal and branching of the instructions, in the cache hierarchy?

4. Finally, what is the size of 1 micro-op?

jimdempseyatthecove · ‎10-09-2015

I think you may have to look at your post #3 to find your answer. Preface (I do not work for Intel and am not a CPU design engineer).

You stated "Decoded L1 Instruction Cache". I believe that this is quite different from the "L1 Instruction Cache". See: https://www.google.com.ar/patents/US7328330

The "Decoded" component is part of the instruction pipeline (per the mentioned article).

Whereas (my assumption) the "L1 Instruction Cache" is part of the memory hierarchy and holds a copy of the cache line aligned block of RAM (as fetched by the instruction pre-fetch process).

You will have to do some investigation to determine if my description above applies to Xeon Phi, but I suspect the choice of use of "Decoded" is critical in making this determination.

Jim Dempsey

TimP · ‎10-09-2015

You don't appear to have focused your questions on instruction vs. data cache.

In case your assignment deals with current Intel "big core" CPUs, you should see several hits in you search engine on topics such as Loop Stream Detector and micro-op cache. You appear to be very confused about op fusion; there are useful discussions in

http://www.agner.org/optimize/microarchitecture.pdf

You seem to be trying to go into undocumented features without paying attention to the part which is documented.

When you have focused your interest, you may be able to pick a more relevant forum section on this site.