- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to read some events from the performance monitoring counters of processor Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz,
but I have not been able to find out the performance monitoring guidelines for Ivy Bridge processors in the Intel documentation.
Can anybody guide me in this regard?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Xeon E5-1650 is a Sandy Bridge processor, not an Ivy Bridge processor.
The output of cpuid shows that this processor has a "DisplayFamily_DisplayModel" encoding of 06_2AH.
All of the performance monitoring events for this family are presented in Section 19.4 of Volume 3 of the Intel SW Developer's Manual (document 325384, revision 047 is from June 2013).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Read Intel Optimization Reference Manual appendix B.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak wrote:
Read Intel Optimization Reference Manual appendix B.
Thanks for your reply. Could you please point out where exactly I can find the performance monitoring information for Ivy Bridge
in this manual as I could not see much regarding Ivy Bridge in Appendix B . http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are right.I double checked that B Appendix ant it does not have any info about the Ivy Bridge.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Revision 028 of the Optimization Reference Manual contains a statement in Appendix B about an Ivy Bridge performance monitoring event called "CYCLE_ACTIVITY.STALLS_LDM_PENDING".
This appears to relate to Event A3H of the performance counters, but the descriptions of this event in sections 19.2 (Haswell), 19.3 (Ivy Bridge), and 19.2 (Sandy Bridge) of Volume 3 of the SW Developers Manual are quite confusing. For example, in section 19.3 (Ivy Bridge), Event A3H, Umask 01H indicates "cycles with outstanding L2 load misses", while Umask 04H indicates "cycles with dispatch stalls". If combining these two masks into 05H is interpreted as a logical AND operation, then this should give a count of the stall cycles that occur while there is at least one load pending that has missed in the L2 cache.
In section 19.2 (Haswell), the Umask 04H event is not included, but the Umask 05H event is name "CYCLE_ACTIVITY.STALLS_L2_PENDING". The text description does not make sense ("Number of loads missed L2"), but this might indicate that the 01H and 04H masks can be used to count stall cycles while L2 misses are pending.
Looks like an opportunity for Intel to review the documentation....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are there some events which would be common between Sandy Bridge and Ivy Bridge architectures.
I am looking for events like L2 cache misses, uops retired etc.
Here is the output of cpuid for one of the cores of Intel Xeon E5 1650.
CPU 0:
vendor_id = "GenuineIntel"
version information (1/eax):
processor type = primary processor (0)
family = Intel Pentium Pro/II/III/Celeron/Core/Core 2/Atom, AMD Athlon/Duron, Cyrix M2, VIA C3 (6)
model = 0xd (13)
stepping id = 0x7 (7)
extended family = 0x0 (0)
extended model = 0x2 (2)
(simple synth) = Intel Core i7-3800/3900 (Sandy Bridge-E C2) / Xeon E5-1600/2600 (Sandy Bridge-E C2/M1), 32nm
miscellaneous (1/ebx):
process local APIC physical ID = 0x0 (0)
cpu count = 0x20 (32)
CLFLUSH line size = 0x8 (8)
brand index = 0x0 (0)
brand id = 0x00 (0): unknown
feature information (1/edx):
x87 FPU on chip = true
virtual-8086 mode enhancement = true
debugging extensions = true
page size extensions = true
time stamp counter = true
RDMSR and WRMSR support = true
physical address extensions = true
machine check exception = true
CMPXCHG8B inst. = true
APIC on chip = true
SYSENTER and SYSEXIT = true
memory type range registers = true
PTE global bit = true
machine check architecture = true
conditional move/compare instruction = true
page attribute table = true
page size extension = true
processor serial number = false
CLFLUSH instruction = true
debug store = true
thermal monitor and clock ctrl = true
MMX Technology = true
FXSAVE/FXRSTOR = true
SSE extensions = true
SSE2 extensions = true
self snoop = true
hyper-threading / multi-core supported = true
therm. monitor = true
IA64 = false
pending break event = true
feature information (1/ecx):
PNI/SSE3: Prescott New Instructions = true
PCLMULDQ instruction = true
64-bit debug store = true
MONITOR/MWAIT = true
CPL-qualified debug store = true
VMX: virtual machine extensions = true
SMX: safer mode extensions = true
Enhanced Intel SpeedStep Technology = true
thermal monitor 2 = true
SSSE3 extensions = true
context ID: adaptive or shared L1 data = false
FMA instruction = false
CMPXCHG16B instruction = true
xTPR disable = true
perfmon and debug = true
process context identifiers = true
direct cache access = true
SSE4.1 extensions = true
SSE4.2 extensions = true
extended xAPIC support = true
MOVBE instruction = false
POPCNT instruction = true
time stamp counter deadline = true
AES instruction = true
XSAVE/XSTOR states = true
OS-enabled XSAVE/XSTOR = true
AVX: advanced vector extensions = true
F16C half-precision convert instruction = false
RDRAND instruction = false
hypervisor guest status = false
cache and TLB information (2):
0x5a: data TLB: 2M/4M pages, 4-way, 32 entries
0x03: data TLB: 4K pages, 4-way, 64 entries
0x76: instruction TLB: 2M/4M pages, fully, 8 entries
0xff: cache data is in CPUID 4
0xb0: instruction TLB: 4K, 4-way, 128 entries
0xf0: 64 byte prefetching
0xca: L2 TLB: 4K, 4-way, 512 entries
processor serial number: 0002-06D7-0000-0000-0000-0000
deterministic cache parameters (4):
--- cache 0 ---
cache type = data cache (1)
cache level = 0x1 (1)
self-initializing cache level = true
fully associative cache = false
extra threads sharing this cache = 0x1 (1)
extra processor cores on this die = 0xf (15)
system coherency line size = 0x3f (63)
physical line partitions = 0x0 (0)
ways of associativity = 0x7 (7)
WBINVD/INVD behavior on lower caches = false
inclusive to lower caches = false
complex cache indexing = false
number of sets - 1 (s) = 63
--- cache 1 ---
cache type = instruction cache (2)
cache level = 0x1 (1)
self-initializing cache level = true
fully associative cache = false
extra threads sharing this cache = 0x1 (1)
extra processor cores on this die = 0xf (15)
system coherency line size = 0x3f (63)
physical line partitions = 0x0 (0)
ways of associativity = 0x7 (7)
WBINVD/INVD behavior on lower caches = false
inclusive to lower caches = false
complex cache indexing = false
number of sets - 1 (s) = 63
--- cache 2 ---
cache type = unified cache (3)
cache level = 0x2 (2)
self-initializing cache level = true
fully associative cache = false
extra threads sharing this cache = 0x1 (1)
extra processor cores on this die = 0xf (15)
system coherency line size = 0x3f (63)
physical line partitions = 0x0 (0)
ways of associativity = 0x7 (7)
WBINVD/INVD behavior on lower caches = false
inclusive to lower caches = false
complex cache indexing = false
number of sets - 1 (s) = 511
--- cache 3 ---
cache type = unified cache (3)
cache level = 0x3 (3)
self-initializing cache level = true
fully associative cache = false
extra threads sharing this cache = 0x1f (31)
extra processor cores on this die = 0xf (15)
system coherency line size = 0x3f (63)
physical line partitions = 0x0 (0)
ways of associativity = 0xf (15)
WBINVD/INVD behavior on lower caches = false
inclusive to lower caches = true
complex cache indexing = true
number of sets - 1 (s) = 12287
MONITOR/MWAIT (5):
smallest monitor-line size (bytes) = 0x40 (64)
largest monitor-line size (bytes) = 0x40 (64)
enum of Monitor-MWAIT exts supported = true
supports intrs as break-event for MWAIT = true
number of C0 sub C-states using MWAIT = 0x0 (0)
number of C1 sub C-states using MWAIT = 0x2 (2)
number of C2 sub C-states using MWAIT = 0x1 (1)
number of C3/C6 sub C-states using MWAIT = 0x1 (1)
number of C4/C7 sub C-states using MWAIT = 0x2 (2)
Thermal and Power Management Features (6):
digital thermometer = true
Intel Turbo Boost Technology = true
ARAT always running APIC timer = true
PLN power limit notification = true
ECMD extended clock modulation duty = true
PTM package thermal management = true
digital thermometer thresholds = 0x2 (2)
ACNT/MCNT supported performance measure = true
ACNT2 available = false
performance-energy bias capability = true
extended feature flags (7):
FSGSBASE instructions = false
BMI instruction = false
SMEP support = false
enhanced REP MOVSB/STOSB = false
INVPCID instruction = false
Direct Cache Access Parameters (9):
PLATFORM_DCA_CAP MSR bits = 1
Architecture Performance Monitoring Features (0xa/eax):
version ID = 0x3 (3)
number of counters per logical processor = 0x8 (8)
bit width of counter = 0x30 (48)
length of EBX bit vector = 0x7 (7)
Architecture Performance Monitoring Features (0xa/ebx):
core cycle event not available = false
instruction retired event not available = false
reference cycles event not available = false
last-level cache ref event not available = false
last-level cache miss event not avail = false
branch inst retired event not available = false
branch mispred retired event not avail = false
Architecture Performance Monitoring Features (0xa/edx):
number of fixed counters = 0x3 (3)
bit width of fixed counters = 0x30 (48)
x2APIC features / processor topology (0xb):
--- level 0 (thread) ---
bits to shift APIC ID to get next = 0x1 (1)
logical processors at this level = 0x2 (2)
level number = 0x0 (0)
level type = thread (1)
extended APIC ID = 0
--- level 1 (core) ---
bits to shift APIC ID to get next = 0x5 (5)
logical processors at this level = 0xc (12)
level number = 0x1 (1)
level type = core (2)
extended APIC ID = 0
0x0000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
XSAVE features (0xd/0):
XCR0 lower 32 bits valid bit field mask = 0x00000007
bytes required by fields in XCR0 = 0x00000340 (832)
bytes required by XSAVE/XRSTOR area = 0x00000340 (832)
XCR0 upper 32 bits valid bit field mask = 0x00000000
YMM features (0xd/2):
YMM save state byte size = 0x00000100 (256)
YMM save state byte offset = 0x00000240 (576)
LWP features (0xd/0x3e):
LWP save state byte size = 0x00000000 (0)
LWP save state byte offset = 0x00000000 (0)
extended feature flags (0x80000001/edx):
SYSCALL and SYSRET instructions = true
execution disable = true
1-GB large page support = true
RDTSCP = true
64-bit extensions technology available = true
Intel feature flags (0x80000001/ecx):
LAHF/SAHF supported in 64-bit mode = true
brand = " Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz"
L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
instruction # entries = 0x0 (0)
instruction associativity = 0x0 (0)
data # entries = 0x0 (0)
data associativity = 0x0 (0)
L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
instruction # entries = 0x0 (0)
instruction associativity = 0x0 (0)
data # entries = 0x0 (0)
data associativity = 0x0 (0)
L1 data cache information (0x80000005/ecx):
line size (bytes) = 0x0 (0)
lines per tag = 0x0 (0)
associativity = 0x0 (0)
size (Kb) = 0x0 (0)
L1 instruction cache information (0x80000005/edx):
line size (bytes) = 0x0 (0)
lines per tag = 0x0 (0)
associativity = 0x0 (0)
size (Kb) = 0x0 (0)
L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
instruction # entries = 0x0 (0)
instruction associativity = L2 off (0)
data # entries = 0x0 (0)
data associativity = L2 off (0)
L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
instruction # entries = 0x0 (0)
instruction associativity = L2 off (0)
data # entries = 0x0 (0)
data associativity = L2 off (0)
L2 unified cache information (0x80000006/ecx):
line size (bytes) = 0x40 (64)
lines per tag = 0x0 (0)
associativity = 8-way (6)
size (Kb) = 0x100 (256)
L3 cache information (0x80000006/edx):
line size (bytes) = 0x0 (0)
lines per tag = 0x0 (0)
associativity = L2 off (0)
size (in 512Kb units) = 0x0 (0)
Advanced Power Management Features (0x80000007/edx):
temperature sensing diode = false
frequency ID (FID) control = false
voltage ID (VID) control = false
thermal trip (TTP) = false
thermal monitor (TM) = false
software thermal control (STC) = false
100 MHz multiplier control = false
hardware P-State control = false
TscInvariant = true
Physical Address and Linear Address Size (0x80000008/eax):
maximum physical address bits = 0x2e (46)
maximum linear (virtual) address bits = 0x30 (48)
maximum guest physical address bits = 0x0 (0)
Logical CPU cores (0x80000008/ecx):
number of CPU cores - 1 = 0x0 (0)
ApicIdCoreIdSize = 0x0 (0)
(multi-processing synth): multi-core (c=6), hyper-threaded (t=2)
(multi-processing method): Intel leaf 0xb
(APIC widths synth): CORE_width=5 SMT_width=1
(APIC synth): PKG_ID=0 CORE_ID=0 SMT_ID=0
(synth) = Intel Xeon E5-1600/2600 (Sandy Bridge-E C2/M1), 32nm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Xeon E5-1650 is a Sandy Bridge processor, not an Ivy Bridge processor.
The output of cpuid shows that this processor has a "DisplayFamily_DisplayModel" encoding of 06_2AH.
All of the performance monitoring events for this family are presented in Section 19.4 of Volume 3 of the Intel SW Developer's Manual (document 325384, revision 047 is from June 2013).

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page