Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Branch Trace Store

q1nex
Beginner
669 Views
Hi,

I need some help in enabling BTS.
My CPU is Core 2 Duo E8400, OS - Windows XP SP3 32bit.
I've reread manual and rechecked everything over 9000 times. Everything seems to be correct.
Thats what I checked:

CPUID.1:EDX[21] = 1
IA32_MISC_ENABLE[7] (Performance Monitoring Available) = 1: Performance monitoring enabled
IA32_MISC_ENABLE[11] (Branch Trace Storage Unavailable) = 0: BTS is supported
IA32_APIC_BASE[11] (APIC global enable/disable flag) = 1: APIC enabled.
Spurious Interrupt Vector Register, bit 8 = 1: APIC Enabled.
Error Status Register = 0: there are no APIC errors.

DS area created, IA32_DS_AREA MSR, IA32_DEBUGCTL MSR and LVT Performance Counter Register are set, PMI handler in the IDT is established. APIC registers' base is also checked. For DS area I tried to use reserved in driver image memory and memory allocated with ExAllocatePool().

On enabling BTS the performance (of the core at which it was enabled) slows down, but there are no any records in BTS buffer.
Here is some code in fasm (simplified parts).
DS structure:
[plain]BTS_entries_num = 330 reserved_BTS_entries_num = 80 struct Branch_Record Branch_From dd ? Branch_To dd ? Branch_Predicted dd ? ends struct DS ;-------BTS buffer base BTS_buffer_base dd DS.BTS_buffer ;-------BTS index BTS_index dd DS.BTS_buffer ;-------BTS absolute maximum BTS_max dd BTS_entries_num *12 +DS.BTS_buffer ;-------BTS interrupt threshold BTS_int dd (BTS_entries_num - reserved_BTS_entries_num) *12 +DS.BTS_buffer ;-------PEBS save area dd 6 dup ? ld dd ? av = 128 - ((DS.ld+4) mod 128) ;bytes to align db av dup ? ;align 128 BTS_buffer Branch_Record.dup BTS_entries_num ;BTS_entries_num of Branch_Record struct ends[/plain] DS initialization:
[plain];----allocating memory push sizeof.DS push NonPagedPoolCacheAligned call [ExAllocatePool] mov [DS_addr], eax ;----DS initialization lea ebx, [eax+DS.BTS_buffer] mov [eax+DS.BTS_buffer_base], ebx mov [eax+DS.BTS_index], ebx add ebx, BTS_entries_num *sizeof.Branch_Record mov [eax+DS.BTS_max], ebx sub ebx, reserved_BTS_entries_num *sizeof.Branch_Record mov [eax+DS.BTS_int], ebx ;----clearing memory mov edi, [DS_addr] add edi, DS.BTS_buffer mov ecx, BTS_entries_num*sizeof.Branch_Record xor eax, eax cld rep stosb[/plain] Setting LVT and IDT:
[bash]vec_num = 24h ; fixed edge sensitive not masked mov dword [0FFFE0340h], vec_num or (000b shl 8) or (0b shl 15) or (0b shl 16) push esi sidt [esp-2] pop esi add esi, vec_num*8 mov eax, IntHandler mov word [esi], ax bswap eax xchg al, ah mov word [esi+6], ax mov ax, cs mov word [esi+2], ax mov byte [esi+4], 0 mov byte [esi+5], 10001111b[/bash] BTS enabling:
[plain]mov ecx, 600h ;IA32_DS_AREA rdmsr mov eax, [DS_addr] wrmsr mov ecx, 01D9h ;IA32_DEBUGCTL rdmsr ; TR BTS BTINT BTS_OFF_OS BTS_OFF_USR mov eax, (1 shl 6) or (1 shl 7) or (1 shl 8) or (1 shl 9) or (0 shl 10) wrmsr[/plain]
And I also have some questions (manual doesn't give CLEAR answers on them):
1. Can DS be on same page with code (if triggering self-modifying code actions doesn't worry me)?

2. (from manual) "The DS save area can be larger than a page, but the pages must be mapped to
contiguous linear addresses."

Does it mean that all 3 DS areas must be in pages that are contiguous on LINEAR space or does it mean that pages with DS must be MAPPED to contiguous PHYSICAL addresses? Because pages are mapped to physical addresses rather than linear...

3. (from manual) "In order to prevent generating an interrupt, when working with
circular BTS buffer, SW need to set BTS interrupt threshold to a value
greater than BTS absolute maximum (fields of the DS buffer
management area). It's not enough to clear the BTINT flag itself only."

In other words, BTINT doesn't control PMIs. So, what is the purpose of BTINT?

4. APIC registers can only be accessed with mov or other institutions (and, or etc) are acceptable?



P.S. Working code in any language (asm is preferred) will be useful.

Thanks,
q1nex
0 Kudos
3 Replies
Patrick_F_Intel1
Employee
669 Views
Hello q1nex,
I'm trying to find someone who can answer your questions.
Pat
0 Kudos
q1nex
Beginner
669 Views
Thanks, Pat. Hope you'll find somebody.

And one more question about BTS.
Nehalem and newer CPUs have 16 pairs of LBR MSRs while Core 2 have only 4. Does it mean BTS performance with Nehalem will be almost 4 times higher?
0 Kudos
Patrick_F_Intel1
Employee
669 Views
Hello q1nex,
Here is the reply from our BTS guy (who was on vacation).
Note that the LBR facility is much faster than (and quite different from) the BTS facility.

[bash]The problem is that all BTS structures are 64-bit (even in the 32-bit mode) starting from Merom (family 6, model 15), so all pointers in the asm control structures should be declared as DQ instead of DD: 1. struct Branch_Record 2. 3. Branch_From dq ? 4. Branch_To dq ? 5. Branch_Predicted dq ? 6. 7. ends 8. 9. 10. struct DS 11. ;-------BTS buffer base 12. BTS_buffer_base dq DS.BTS_buffer 13. 14. ;-------BTS index 15. BTS_index dq DS.BTS_buffer 16. 17. ;-------BTS absolute maximum 18. BTS_max dq BTS_entries_num *12 +DS.BTS_buffer 19. 20. ;-------BTS interrupt threshold 21. BTS_int dq (BTS_entries_num - reserved_BTS_entries_num) *12 +DS.BTS_buffer 22. 23. 24. ;-------PEBS save area 25. dd 6 dup ? 26. ld dd ? 27. 28. av = 128 - ((DS.ld+4) mod 128) ;bytes to align 29. db av dup ? ;align 128 30. 31. 32. BTS_buffer Branch_Record.dup BTS_entries_num ;BTS_entries_num of Branch_Record struct 33. ends And to the other questions: 1. Can DS be on same page with code (if triggering self-modifying code actions doesn't worry me)? Never checked it, but can see no problem here. 2. (from manual) "The DS save area can be larger than a page, but the pages must be mapped to contiguous linear addresses." Does it mean that all 3 DS areas must be in pages that are contiguous on LINEAR space or does it mean that pages with DS must be MAPPED to contiguous PHYSICAL addresses? Because pages are mapped to physical addresses rather than linear... Yes, the pages should be linearly contiguous. 3. (from manual) "In order to prevent generating an interrupt, when working with circular BTS buffer, SW need to set BTS interrupt threshold to a value greater than BTS absolute maximum (fields of the DS buffer management area). It's not enough to clear the BTINT flag itself only." In other words, BTINT doesn't control PMIs. So, what is the purpose of BTINT? BTINT controls the generation of interrupt If its 0, no interrupt will be generated. Both BTINT and threshold control the buffer operation: the buffer becomes circular if BTINT=0 and Threshold > max_size, the buffer is non-circular and generates PMI if Threshold < max_size and BTINT = 1, and the buffer is non-circular and does not generate PMI if Threshold < max_size and BTINT = 0. 4. APIC registers can only be accessed with mov or other institutions (and, or etc) are acceptable? APIC registers can be accessed using any instruction, but one has to take into account various side-effects as, for instance, AND instruction will emit both load and store uOps, and mov instructions are more predictable, thats why they are recommended for use with APIC. [/bash]
0 Kudos
Reply