Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4999 Discussions

Question about the Montecito Dual-Core Itanium2 processor events.

Scott1
Beginner
300 Views

Hi all I'm working with a SMP system that has 4 Montecito processors. As we know, if an operand is going to be loaded from the DRAM, it will cause some bubbles. My question is which category of event will be the bubbles added to, BE_EXE_BUBBLE_FR(GR)ALL or BE_L1D_FPU_BUBBLE_L1D?

I am confused about this because when I ran two programs(say A and B) with 4 copies of themselves respectively, both thefour-A andfour-B combination slowed down as expected because of the limitation ofmemory-bus bandwidth.But infour-A combinationevent BE_EXE_BUBBLE_FR(GR)ALL increased much, and in four-Bevent BE_L1D_FPU_BUBBLE_L1D increased much. It is sure that the extra bubbles were all caused by memory-bus in the two combination because memory-bus is the only shared resource among the four Montecito processors(note that the Hyper-threading is not used).

So, what's the differences between "BE_EXE_BUBBLE_FR(GR)ALL" and "BE_L1D_FPU_BUBBLE_L1D" in the term of DRAM-load issuse. Thanks very much!

0 Kudos
1 Solution
srimks
New Contributor II
300 Views
Quoting - xudiict.ac.cn

Hi all I'm working with a SMP system that has 4 Montecito processors. As we know, if an operand is going to be loaded from the DRAM, it will cause some bubbles. My question is which category of event will be the bubbles added to, BE_EXE_BUBBLE_FR(GR)ALL or BE_L1D_FPU_BUBBLE_L1D?

I am confused about this because when I ran two programs(say A and B) with 4 copies of themselves respectively, both thefour-A andfour-B combination slowed down as expected because of the limitation ofmemory-bus bandwidth.But infour-A combinationevent BE_EXE_BUBBLE_FR(GR)ALL increased much, and in four-Bevent BE_L1D_FPU_BUBBLE_L1D increased much. It is sure that the extra bubbles were all caused by memory-bus in the two combination because memory-bus is the only shared resource among the four Montecito processors(note that the Hyper-threading is not used).

So, what's the differences between "BE_EXE_BUBBLE_FR(GR)ALL" and "BE_L1D_FPU_BUBBLE_L1D" in the term of DRAM-load issuse. Thanks very much!

Hi.

The difference between BE_EXE_BUBBLE_FR(GR)AL & BE_L1D_FPU_BUBBLE_L1D are -

BE_EXE_BUBBLE_FR(GR)AL: This counter(BE_EXE_BUBBLE) accumulates stall cycles in the EXE stage of pipeline. These stalls occur mostlly because of dataloaded into the registers are not ready for consumption by the funcional units.BE_EXE_BUBBLE_FR(GR)AL is one of the sub-event of BE_EXE_BUBBLE_FR.

BE_L1D_FPU_BUBBLE_L1D: This event accumulates stall cycles caused by the micropipelines associated with the L1D & FPU stalling the core pipeline at the DET stage. (Refer: Front-End Pipeline Stages). The stalls cycles accumulatedby this counter are dominated by memory access stalls that have a different architectural basis than those accumulated byBE_EXE_BUBBLE event. BE_L1D_FPU_BUBBLE_L1D is one of the sub-event of BE_L1D_FPU_BUBBLE.

~BR

View solution in original post

0 Kudos
2 Replies
srimks
New Contributor II
301 Views
Quoting - xudiict.ac.cn

Hi all I'm working with a SMP system that has 4 Montecito processors. As we know, if an operand is going to be loaded from the DRAM, it will cause some bubbles. My question is which category of event will be the bubbles added to, BE_EXE_BUBBLE_FR(GR)ALL or BE_L1D_FPU_BUBBLE_L1D?

I am confused about this because when I ran two programs(say A and B) with 4 copies of themselves respectively, both thefour-A andfour-B combination slowed down as expected because of the limitation ofmemory-bus bandwidth.But infour-A combinationevent BE_EXE_BUBBLE_FR(GR)ALL increased much, and in four-Bevent BE_L1D_FPU_BUBBLE_L1D increased much. It is sure that the extra bubbles were all caused by memory-bus in the two combination because memory-bus is the only shared resource among the four Montecito processors(note that the Hyper-threading is not used).

So, what's the differences between "BE_EXE_BUBBLE_FR(GR)ALL" and "BE_L1D_FPU_BUBBLE_L1D" in the term of DRAM-load issuse. Thanks very much!

Hi.

The difference between BE_EXE_BUBBLE_FR(GR)AL & BE_L1D_FPU_BUBBLE_L1D are -

BE_EXE_BUBBLE_FR(GR)AL: This counter(BE_EXE_BUBBLE) accumulates stall cycles in the EXE stage of pipeline. These stalls occur mostlly because of dataloaded into the registers are not ready for consumption by the funcional units.BE_EXE_BUBBLE_FR(GR)AL is one of the sub-event of BE_EXE_BUBBLE_FR.

BE_L1D_FPU_BUBBLE_L1D: This event accumulates stall cycles caused by the micropipelines associated with the L1D & FPU stalling the core pipeline at the DET stage. (Refer: Front-End Pipeline Stages). The stalls cycles accumulatedby this counter are dominated by memory access stalls that have a different architectural basis than those accumulated byBE_EXE_BUBBLE event. BE_L1D_FPU_BUBBLE_L1D is one of the sub-event of BE_L1D_FPU_BUBBLE.

~BR

0 Kudos
Scott1
Beginner
300 Views
Thanks very much, and I found a very useful manual: Introduction to Microarchitectural Optimization for Itanium2 Processors at http://cache-www.intel.com/cd/00/00/21/93/219348_software_optimization.pdf
There are detailed explanation about the events. I think everybody who are not clear about the Itanium events shoud read that.
0 Kudos
Reply