Developing Games on Intel Graphics
If you are gaming on graphics integrated in your Intel Processor, this is the place for you! Find answers to your questions or post your issues with PC games
486 Discussions

Little profiler in MASM

rodionkarimov
Beginner
487 Views

Hello.

I don't know where to post, so I'll post here. I'm writing profiler in MASM, which I'll use in my games and other programs. I've made profiler as two parts system - one is ring 0 driver, written in MASM, which writes and reads MSRs. And other is front end in Free Pascal, which launches driver, configures it, sends commands on performance counters reads and unloads it.

So now phase of errors in driver and reboots have been passed. Driver loads, configures MSRs' reads and writes, perform these reads and writes and unloads. But results, which it returns - very strange - for LLC misses there is very big number - of the same magnitude, as for "UnHalted Core Cycles", "UnHalted Reference Cycles" and "Instruction Retired" with slight differences. In attachment there are results, that program shows.

So, I need some help, maybe I'm making some obvious mistakes. Here is MASM code of procedure DispatchControl, which handles all DeviceIoControl calls -



DispatchControl proc uses esi edi pDeviceObject:PDEVICE_OBJECT, pIrp:PIRP

; DeviceIoControl was called
; We are in user process context here

local status:NTSTATUS
local dwBytesReturned:DWORD

and dwBytesReturned, 0

mov esi, pIrp
assume esi:ptr _IRP

IoGetCurrentIrpStackLocation esi
mov edi, eax
assume edi:ptr IO_STACK_LOCATION

push ebx

mov ebx, [esi].AssociatedIrp.SystemBuffer

.if [edi].Parameters.DeviceIoControl.OutputBufferLength >= sizeof PERFORMANCE

.if [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_END_PC_READ

mov ecx, 38fh
mov eax, 00000000h
mov edx, 0h
wrmsr

mov dwBytesReturned, sizeof PERFORMANCE
mov status, STATUS_SUCCESS



.elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_READ

;---LLC miss---------------------
mov ecx, 0c1h
rdmsr

mov dword ptr [ ebx + 48 ], eax
mov dword ptr [ ebx + 52 ], edx

;---BranchMissesRetired----------
mov ecx, 0c2h
rdmsr

mov dword ptr [ ebx + 64 ], eax
mov dword ptr [ ebx + 68 ], edx

;---Fixed function---------------

;---InstrRetired.Any-------------
mov ecx, 309h
rdmsr

mov dword ptr [ ebx + 32 ], eax
mov dword ptr [ ebx + 36 ], edx

;---CPU_CLK_Unhalted.Core--------
mov ecx, 30ah
rdmsr

mov dword ptr [ ebx + 16 ], eax
mov dword ptr [ ebx + 20 ], edx

;---CPU_CLK_Unhalted.Ref---------
mov ecx, 30bh
rdmsr

mov dword ptr [ ebx + 24 ], eax
mov dword ptr [ ebx + 28 ], edx

mov dwBytesReturned, sizeof PERFORMANCE
mov status, STATUS_SUCCESS



.elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE



;---Zeroing performance counters-
mov eax, 0
mov edx, 0

mov ecx, 0c1h
wrmsr

mov ecx, 0c2h
wrmsr

;---Fixed function---------------
mov ecx, 309h
wrmsr

mov ecx, 30ah
wrmsr

mov ecx, 30bh
wrmsr



;---Thread processor affinity----
invoke KeGetCurrentThread

add ebx, 72
invoke ZwSetInformationThread, eax, ThreadAffinityMask, DWORD ptr [ ebx ], sizeof KAFFINITY

;---LLC miss---------------------
mov ecx, 186h
mov eax, 41412eh
mov edx, 0

wrmsr

;---BranchMissesRetired----------
mov ecx, 187h
mov eax, 4100c5h
mov edx, 0

wrmsr

;---Fixed function---------------

;---MSR_PERF_FIXED_CTR_CTRL------
mov ecx, 38dh
mov eax, 222h
mov edx, 0
wrmsr

;---MSR_PERF_GLOBAL_CTRL---------
mov ecx, 38fh
mov eax, 00000011h
mov edx, 7h
wrmsr

mov dwBytesReturned, sizeof PERFORMANCE
mov status, STATUS_SUCCESS



.else
mov status, STATUS_INVALID_DEVICE_REQUEST
.endif

.else
mov status, STATUS_BUFFER_TOO_SMALL
.endif



;---Returning from procedure---------------------
pop ebx

assume edi:nothing

push status
pop [esi].IoStatus.Status

push dwBytesReturned
pop [esi].IoStatus.Information

assume esi:nothing

fastcall IofCompleteRequest, esi, IO_NO_INCREMENT

mov eax, status
ret

DispatchControl endp





Structure, in which driver writes results is of this type -





PERFORMANCE STRUCT
tscEAX DWORD ?
tscEDX DWORD ?

RD_MSR_tscEAX DWORD ?
RD_MSR_tscEDX DWORD ?

UnHaltedCoreCycles QWORD ?
UnHaltedReferenceCycles QWORD ?

InstructionRetired QWORD ?

LLCReference QWORD ?
LLCMiss QWORD ?

BranchInstructionRetired QWORD ?
BranchMissesRetired QWORD ?

ProcessorCore DWORD ?

PERFORMANCE ENDS




And IOCtl constants are so -




IOCTL_CACHE_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 801h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_BRANCH_MISSPRED_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 802h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_LLC_AND_BRANCH_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 803h, METHOD_BUFFERED, FILE_READ_ACCESS )

IOCTL_END_PC_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 804h, METHOD_BUFFERED, FILE_READ_ACCESS )

IOCTL_CACHE_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 805h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_BRANCH_MISSPRED_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 806h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 807h, METHOD_BUFFERED, FILE_READ_ACCESS )





Now Free Pascal code, that interacts with driver, is so -





ZeroMemory ( @ BBefore, SizeOf ( TPerformanceData ) );
BBefore.ProcessorCore := CORE_TO_WORK_ON;

ZeroMemory ( @ BAfter, SizeOf ( TPerformanceData ) );

Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );

Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );

for i := 0 to 10000 do if TempCardinal div 3 > 100 then TempCardinal := TempCardinal shr 1
else TempCardinal := TempCardinal + i div 5;

Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );

Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_END_PC_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );



//---Showing results-----------------------------------
ShowMessage ( 'UnHalted Core Cycles - ' + IntToStr ( BAfter.UnHaltedCoreCycles - BBefore.UnHaltedCoreCycles ) + #13#13 +
'UnHalted Reference Cycles - ' + IntToStr ( BAfter.UnHaltedReferenceCycles - BBefore.UnHaltedReferenceCycles ) + #13#13 +
'Instruction Retired - ' + IntToStr ( BAfter.InstructionRetired - BBefore.InstructionRetired ) + #13#13 +

'LLC miss - ' + IntToStr ( BAfter.LLCMiss - BBefore.LLCMiss ) + #13#13 +
'Branch Misses Retired - ' + IntToStr ( BAfter.BranchMissesRetired - BBefore.BranchMissesRetired ) );





Structure, which is used to interact with driver is so -





TPerformanceData = record
tscEAX, tscEDX : Cardinal;
RD_MSR_tscEAX, RD_MSR_tscEDX : Cardinal; // 8

UnHaltedCoreCycles : QWORD; // 00 3c - 16
UnHaltedReferenceCycles : QWORD; // 01 3c - 24

InstructionRetired : QWORD; // 00 c0 - 32

LLCReference : QWORD; // 4f 2e - 40
LLCMiss : QWORD; // 41 2e - 48

BranchInstructionRetired : QWORD; // 00 c4 - 56
BranchMissesRetired : QWORD; // 00 c5 - 64

ProcessorCore : Cardinal; // 72

end;





Constants, that are used to create IOCtlCodes, are so -





FUNCTION_CACHE_MISS_READ = $801;
FUNCTION_BRANCH_MISSPRED_READ = $802;
FUNCTION_LLC_AND_BRANCH_MISS_READ = $803;

FUNCTION_END_PC_READ = $804;

FUNCTION_CACHE_MISS_CONFIGURE = $805;
FUNCTION_BRANCH_MISSPRED_CONFIGURE = $806;
FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE = $807;





So, I install and run driver as service - everything is correct, and I checked, that all calls to DeviceIoControl are properly handled by driver. In the beginning I attach frontend and driver to the same core, using constant CORE_TO_WORK_ON, which equals 0. Driver reads this value from field ProcessorCore in record TPerformanceData.



I work in Windows 7 and my processor is E5200. So, if somebody have experience or sees any errors - please help me. I'll be very grateful to you.

0 Kudos
4 Replies
Stephen_H_Intel
Employee
487 Views
Hi rodionkarimov



Writing a profiler is quite an undertaking, I'm trying to find an expert who can help you out.

In the mean time, have you considered looking at any of theprofilers we have?AmplifierXE springs to mind, or maybe look round whatif.intel.com - Performance Tuning Utility (PTU) might be right up your street.

If you used one ofour profilersyou would be free to write the games you wanted to profile?

Regards

Steve
0 Kudos
rodionkarimov
Beginner
487 Views

> Writing a profiler is quite an undertaking.

I'm writing simple profiler, it is targeted only on my CPU with Architectural Performance Counters and, at beginning, it will measure only simple parameters, like LLC and branch mispredictions.

Profilers, that you offer, cost money, but I have now not to much, to buy them. And besides - I have such a feeling, that I almost completed profiler. Plus my own profiler is more flexible - I can use it in any way and in any place. And it is very good work for education.

0 Kudos
Stephen_H_Intel
Employee
487 Views
Hi Rodionkarimov

You will find quite a lot of useful information about setting up counters here http://software.intel.com/en-us/articles/intel-performance-counter-monitor/

There is also example driver source code and lots of other useful information too.

I hope this helps.

Regards

Steve
0 Kudos
rodionkarimov
Beginner
487 Views

Hello,Steve Hughes.

Thank you, I've already read this article and hoped, that somebody more experienced in MSR programming can point me on my errors. So, now I'll more thoroughly study this driver.

0 Kudos
Reply