- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
I don't know where to post, so I'll post here. I'm writing profiler in MASM, which I'll use in my games and other programs. I've made profiler as two parts system - one is ring 0 driver, written in MASM, which writes and reads MSRs. And other is front end in Free Pascal, which launches driver, configures it, sends commands on performance counters reads and unloads it.
So now phase of errors in driver and reboots have been passed. Driver loads, configures MSRs' reads and writes, perform these reads and writes and unloads. But results, which it returns - very strange - for LLC misses there is very big number - of the same magnitude, as for "UnHalted Core Cycles", "UnHalted Reference Cycles" and "Instruction Retired" with slight differences. In attachment there are results, that program shows.
So, I need some help, maybe I'm making some obvious mistakes. Here is MASM code of procedure DispatchControl, which handles all DeviceIoControl calls -
DispatchControl proc uses esi edi pDeviceObject:PDEVICE_OBJECT, pIrp:PIRP
; DeviceIoControl was called
; We are in user process context here
local status:NTSTATUS
local dwBytesReturned:DWORD
and dwBytesReturned, 0
mov esi, pIrp
assume esi:ptr _IRP
IoGetCurrentIrpStackLocation esi
mov edi, eax
assume edi:ptr IO_STACK_LOCATION
push ebx
mov ebx, [esi].AssociatedIrp.SystemBuffer
.if [edi].Parameters.DeviceIoControl.OutputBufferLength >= sizeof PERFORMANCE
.if [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_END_PC_READ
mov ecx, 38fh
mov eax, 00000000h
mov edx, 0h
wrmsr
mov dwBytesReturned, sizeof PERFORMANCE
mov status, STATUS_SUCCESS
.elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_READ
;---LLC miss---------------------
mov ecx, 0c1h
rdmsr
mov dword ptr [ ebx + 48 ], eax
mov dword ptr [ ebx + 52 ], edx
;---BranchMissesRetired----------
mov ecx, 0c2h
rdmsr
mov dword ptr [ ebx + 64 ], eax
mov dword ptr [ ebx + 68 ], edx
;---Fixed function---------------
;---InstrRetired.Any-------------
mov ecx, 309h
rdmsr
mov dword ptr [ ebx + 32 ], eax
mov dword ptr [ ebx + 36 ], edx
;---CPU_CLK_Unhalted.Core--------
mov ecx, 30ah
rdmsr
mov dword ptr [ ebx + 16 ], eax
mov dword ptr [ ebx + 20 ], edx
;---CPU_CLK_Unhalted.Ref---------
mov ecx, 30bh
rdmsr
mov dword ptr [ ebx + 24 ], eax
mov dword ptr [ ebx + 28 ], edx
mov dwBytesReturned, sizeof PERFORMANCE
mov status, STATUS_SUCCESS
.elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE
;---Zeroing performance counters-
mov eax, 0
mov edx, 0
mov ecx, 0c1h
wrmsr
mov ecx, 0c2h
wrmsr
;---Fixed function---------------
mov ecx, 309h
wrmsr
mov ecx, 30ah
wrmsr
mov ecx, 30bh
wrmsr
;---Thread processor affinity----
invoke KeGetCurrentThread
add ebx, 72
invoke ZwSetInformationThread, eax, ThreadAffinityMask, DWORD ptr [ ebx ], sizeof KAFFINITY
;---LLC miss---------------------
mov ecx, 186h
mov eax, 41412eh
mov edx, 0
wrmsr
;---BranchMissesRetired----------
mov ecx, 187h
mov eax, 4100c5h
mov edx, 0
wrmsr
;---Fixed function---------------
;---MSR_PERF_FIXED_CTR_CTRL------
mov ecx, 38dh
mov eax, 222h
mov edx, 0
wrmsr
;---MSR_PERF_GLOBAL_CTRL---------
mov ecx, 38fh
mov eax, 00000011h
mov edx, 7h
wrmsr
mov dwBytesReturned, sizeof PERFORMANCE
mov status, STATUS_SUCCESS
.else
mov status, STATUS_INVALID_DEVICE_REQUEST
.endif
.else
mov status, STATUS_BUFFER_TOO_SMALL
.endif
;---Returning from procedure---------------------
pop ebx
assume edi:nothing
push status
pop [esi].IoStatus.Status
push dwBytesReturned
pop [esi].IoStatus.Information
assume esi:nothing
fastcall IofCompleteRequest, esi, IO_NO_INCREMENT
mov eax, status
ret
DispatchControl endp
Structure, in which driver writes results is of this type -
PERFORMANCE STRUCT
tscEAX DWORD ?
tscEDX DWORD ?
RD_MSR_tscEAX DWORD ?
RD_MSR_tscEDX DWORD ?
UnHaltedCoreCycles QWORD ?
UnHaltedReferenceCycles QWORD ?
InstructionRetired QWORD ?
LLCReference QWORD ?
LLCMiss QWORD ?
BranchInstructionRetired QWORD ?
BranchMissesRetired QWORD ?
ProcessorCore DWORD ?
PERFORMANCE ENDS
And IOCtl constants are so -
IOCTL_CACHE_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 801h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_BRANCH_MISSPRED_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 802h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_LLC_AND_BRANCH_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 803h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_END_PC_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 804h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_CACHE_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 805h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_BRANCH_MISSPRED_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 806h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 807h, METHOD_BUFFERED, FILE_READ_ACCESS )
Now Free Pascal code, that interacts with driver, is so -
ZeroMemory ( @ BBefore, SizeOf ( TPerformanceData ) );
BBefore.ProcessorCore := CORE_TO_WORK_ON;
ZeroMemory ( @ BAfter, SizeOf ( TPerformanceData ) );
Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );
Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );
for i := 0 to 10000 do if TempCardinal div 3 > 100 then TempCardinal := TempCardinal shr 1
else TempCardinal := TempCardinal + i div 5;
Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );
Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_END_PC_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );
//---Showing results-----------------------------------
ShowMessage ( 'UnHalted Core Cycles - ' + IntToStr ( BAfter.UnHaltedCoreCycles - BBefore.UnHaltedCoreCycles ) + #13#13 +
'UnHalted Reference Cycles - ' + IntToStr ( BAfter.UnHaltedReferenceCycles - BBefore.UnHaltedReferenceCycles ) + #13#13 +
'Instruction Retired - ' + IntToStr ( BAfter.InstructionRetired - BBefore.InstructionRetired ) + #13#13 +
'LLC miss - ' + IntToStr ( BAfter.LLCMiss - BBefore.LLCMiss ) + #13#13 +
'Branch Misses Retired - ' + IntToStr ( BAfter.BranchMissesRetired - BBefore.BranchMissesRetired ) );
Structure, which is used to interact with driver is so -
TPerformanceData = record
tscEAX, tscEDX : Cardinal;
RD_MSR_tscEAX, RD_MSR_tscEDX : Cardinal; // 8
UnHaltedCoreCycles : QWORD; // 00 3c - 16
UnHaltedReferenceCycles : QWORD; // 01 3c - 24
InstructionRetired : QWORD; // 00 c0 - 32
LLCReference : QWORD; // 4f 2e - 40
LLCMiss : QWORD; // 41 2e - 48
BranchInstructionRetired : QWORD; // 00 c4 - 56
BranchMissesRetired : QWORD; // 00 c5 - 64
ProcessorCore : Cardinal; // 72
end;
Constants, that are used to create IOCtlCodes, are so -
FUNCTION_CACHE_MISS_READ = $801;
FUNCTION_BRANCH_MISSPRED_READ = $802;
FUNCTION_LLC_AND_BRANCH_MISS_READ = $803;
FUNCTION_END_PC_READ = $804;
FUNCTION_CACHE_MISS_CONFIGURE = $805;
FUNCTION_BRANCH_MISSPRED_CONFIGURE = $806;
FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE = $807;
So, I install and run driver as service - everything is correct, and I checked, that all calls to DeviceIoControl are properly handled by driver. In the beginning I attach frontend and driver to the same core, using constant CORE_TO_WORK_ON, which equals 0. Driver reads this value from field ProcessorCore in record TPerformanceData.
I work in Windows 7 and my processor is E5200. So, if somebody have experience or sees any errors - please help me. I'll be very grateful to you.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Writing a profiler is quite an undertaking, I'm trying to find an expert who can help you out.
In the mean time, have you considered looking at any of theprofilers we have?AmplifierXE springs to mind, or maybe look round whatif.intel.com - Performance Tuning Utility (PTU) might be right up your street.
If you used one ofour profilersyou would be free to write the games you wanted to profile?
Regards
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> Writing a profiler is quite an undertaking.
I'm writing simple profiler, it is targeted only on my CPU with Architectural Performance Counters and, at beginning, it will measure only simple parameters, like LLC and branch mispredictions.
Profilers, that you offer, cost money, but I have now not to much, to buy them. And besides - I have such a feeling, that I almost completed profiler. Plus my own profiler is more flexible - I can use it in any way and in any place. And it is very good work for education.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You will find quite a lot of useful information about setting up counters here http://software.intel.com/en-us/articles/intel-performance-counter-monitor/
There is also example driver source code and lots of other useful information too.
I hope this helps.
Regards
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,Steve Hughes.
Thank you, I've already read this article and hoped, that somebody more experienced in MSR programming can point me on my errors. So, now I'll more thoroughly study this driver.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page