- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
 I don't know where to post, so I'll post here. I'm writing profiler in MASM, which I'll use in my games and other programs. I've made profiler as two parts system - one is ring 0 driver, written in MASM, which writes and reads MSRs. And other is front end in Free Pascal, which launches driver, configures it, sends commands on performance counters reads and unloads it.
 So now phase of errors in driver and reboots have been passed. Driver loads, configures MSRs' reads and writes, perform these reads and writes and unloads. But results, which it returns - very strange - for LLC misses there is very big number - of the same magnitude, as for "UnHalted Core Cycles", "UnHalted Reference Cycles" and "Instruction Retired" with slight differences. In attachment there are results, that program shows.
 So, I need some help, maybe I'm making some obvious mistakes. Here is MASM code of procedure DispatchControl, which handles all DeviceIoControl calls -
 
DispatchControl proc uses esi edi pDeviceObject:PDEVICE_OBJECT, pIrp:PIRP
 ; DeviceIoControl was called
 ; We are in user process context here
local status:NTSTATUS
local dwBytesReturned:DWORD
 and dwBytesReturned, 0
 mov esi, pIrp
 assume esi:ptr _IRP
 IoGetCurrentIrpStackLocation esi
 mov edi, eax
 assume edi:ptr IO_STACK_LOCATION
 push ebx
 mov ebx, [esi].AssociatedIrp.SystemBuffer
 .if [edi].Parameters.DeviceIoControl.OutputBufferLength >= sizeof PERFORMANCE
 .if [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_END_PC_READ
    
 mov ecx, 38fh
 mov eax, 00000000h
 mov edx, 0h
 wrmsr
 mov dwBytesReturned, sizeof PERFORMANCE
 mov status, STATUS_SUCCESS
  .elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_READ   
 ;---LLC miss---------------------
 mov ecx, 0c1h
 rdmsr
 mov dword ptr [ ebx + 48 ], eax
 mov dword ptr [ ebx + 52 ], edx
 ;---BranchMissesRetired----------
 mov ecx, 0c2h
 rdmsr
 mov dword ptr [ ebx + 64 ], eax
 mov dword ptr [ ebx + 68 ], edx
 ;---Fixed function---------------
 ;---InstrRetired.Any-------------
 mov ecx, 309h
 rdmsr
 mov dword ptr [ ebx + 32 ], eax
 mov dword ptr [ ebx + 36 ], edx
 ;---CPU_CLK_Unhalted.Core--------
 mov ecx, 30ah
 rdmsr
 mov dword ptr [ ebx + 16 ], eax
 mov dword ptr [ ebx + 20 ], edx
 ;---CPU_CLK_Unhalted.Ref---------
 mov ecx, 30bh
 rdmsr
 mov dword ptr [ ebx + 24 ], eax
 mov dword ptr [ ebx + 28 ], edx
 mov dwBytesReturned, sizeof PERFORMANCE
 mov status, STATUS_SUCCESS
  .elseif [edi].Parameters.DeviceIoControl.IoControlCode == IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE
 
 ;---Zeroing performance counters-
 mov eax, 0
 mov edx, 0
 mov ecx, 0c1h
 wrmsr
 mov ecx, 0c2h
 wrmsr
 ;---Fixed function---------------
 mov ecx, 309h
 wrmsr
 mov ecx, 30ah
 wrmsr
 mov ecx, 30bh
 wrmsr
 ;---Thread processor affinity----
 invoke KeGetCurrentThread
 add ebx, 72
  invoke ZwSetInformationThread, eax, ThreadAffinityMask, DWORD ptr [ ebx ], sizeof KAFFINITY
 ;---LLC miss---------------------
 mov ecx, 186h
 mov eax, 41412eh
 mov edx, 0
 wrmsr
 ;---BranchMissesRetired----------
 mov ecx, 187h
 mov eax, 4100c5h
 mov edx, 0
 wrmsr
 ;---Fixed function---------------
 ;---MSR_PERF_FIXED_CTR_CTRL------
 mov ecx, 38dh
 mov eax, 222h
 mov edx, 0
 wrmsr
 ;---MSR_PERF_GLOBAL_CTRL---------
 mov ecx, 38fh
 mov eax, 00000011h
 mov edx, 7h
 wrmsr
 mov dwBytesReturned, sizeof PERFORMANCE
 mov status, STATUS_SUCCESS
   
   
 .else
 mov status, STATUS_INVALID_DEVICE_REQUEST
 .endif
 .else
 mov status, STATUS_BUFFER_TOO_SMALL
 .endif
 ;---Returning from procedure---------------------
 pop ebx
 assume edi:nothing
 push status
 pop [esi].IoStatus.Status
 push dwBytesReturned
 pop [esi].IoStatus.Information
 assume esi:nothing
 fastcall IofCompleteRequest, esi, IO_NO_INCREMENT
 mov eax, status
 ret
DispatchControl endp
 
 
Structure, in which driver writes results is of this type -
 
 
PERFORMANCE STRUCT
 tscEAX DWORD ?
 tscEDX DWORD ?
 RD_MSR_tscEAX DWORD ?
 RD_MSR_tscEDX DWORD ?
 UnHaltedCoreCycles QWORD ?
 UnHaltedReferenceCycles QWORD ?
 InstructionRetired QWORD ?
 LLCReference QWORD ?
 LLCMiss QWORD ?
 BranchInstructionRetired QWORD ?
 BranchMissesRetired QWORD ?
 ProcessorCore DWORD ?
PERFORMANCE ENDS
 
And IOCtl constants are so -
 
IOCTL_CACHE_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 801h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_BRANCH_MISSPRED_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 802h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_LLC_AND_BRANCH_MISS_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 803h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_END_PC_READ equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 804h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_CACHE_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 805h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_BRANCH_MISSPRED_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 806h, METHOD_BUFFERED, FILE_READ_ACCESS )
IOCTL_LLC_AND_BRANCH_MISS_CONFIGURE equ CTL_CODE ( FILE_DEVICE_UNKNOWN, 807h, METHOD_BUFFERED, FILE_READ_ACCESS )
 
 
Now Free Pascal code, that interacts with driver, is so -
 
 
ZeroMemory ( @ BBefore, SizeOf ( TPerformanceData ) );
 BBefore.ProcessorCore := CORE_TO_WORK_ON;
 ZeroMemory ( @ BAfter, SizeOf ( TPerformanceData ) );
  Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );  
 Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BBefore, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );
 for i := 0 to 10000 do if TempCardinal div 3 > 100 then TempCardinal := TempCardinal shr 1
 else TempCardinal := TempCardinal + i div 5;
  Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_LLC_AND_BRANCH_MISS_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );  
  Status := DeviceIoControl ( hDevice, CtlCode ( FILE_DEVICE_UNKNOWN, FUNCTION_END_PC_READ, METHOD_BUFFERED, FILE_READ_ACCESS ), nil, 0, @ BAfter, SIZE_OF_BUFFER * 4, @ BytesReturned, nil );  
 //---Showing results-----------------------------------
 ShowMessage ( 'UnHalted Core Cycles - ' + IntToStr ( BAfter.UnHaltedCoreCycles - BBefore.UnHaltedCoreCycles ) + #13#13 +
 'UnHalted Reference Cycles - ' + IntToStr ( BAfter.UnHaltedReferenceCycles - BBefore.UnHaltedReferenceCycles ) + #13#13 +
 'Instruction Retired - ' + IntToStr ( BAfter.InstructionRetired - BBefore.InstructionRetired ) + #13#13 +
 'LLC miss - ' + IntToStr ( BAfter.LLCMiss - BBefore.LLCMiss ) + #13#13 +
 'Branch Misses Retired - ' + IntToStr ( BAfter.BranchMissesRetired - BBefore.BranchMissesRetired ) );
 
 
Structure, which is used to interact with driver is so -
 
 
TPerformanceData = record
 tscEAX, tscEDX : Cardinal;
 RD_MSR_tscEAX, RD_MSR_tscEDX : Cardinal; // 8
 UnHaltedCoreCycles : QWORD; // 00 3c - 16
 UnHaltedReferenceCycles : QWORD; // 01 3c - 24
 InstructionRetired : QWORD; // 00 c0 - 32
 LLCReference : QWORD; // 4f 2e - 40
 LLCMiss : QWORD; // 41 2e - 48
 BranchInstructionRetired : QWORD; // 00 c4 - 56
 BranchMissesRetired : QWORD; // 00 c5 - 64
 ProcessorCore : Cardinal; // 72
 end;
 
 
Constants, that are used to create IOCtlCodes, are so -
 
 
FUNCTION_CACHE_MISS_READ = $801;
 FUNCTION_BRANCH_MISSPRED_READ = $802;
 FUNCTION_LLC_AND_BRANCH_MISS_READ = $803;
 FUNCTION_END_PC_READ = $804;
 FUNCTION_CACHE_MISS_CONFIGURE = $805;
 FUNCTION_BRANCH_MISSPRED_CONFIGURE = $806;
 FUNCTION_LLC_AND_BRANCH_MISS_CONFIGURE = $807;
 
 
So, I install and run driver as service - everything is correct, and I checked, that all calls to DeviceIoControl are properly handled by driver. In the beginning I attach frontend and driver to the same core, using constant CORE_TO_WORK_ON, which equals 0. Driver reads this value from field ProcessorCore in record TPerformanceData.
 
I work in Windows 7 and my processor is E5200. So, if somebody have experience or sees any errors - please help me. I'll be very grateful to you.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Writing a profiler is quite an undertaking, I'm trying to find an expert who can help you out.
In the mean time, have you considered looking at any of theprofilers we have?AmplifierXE springs to mind, or maybe look round whatif.intel.com - Performance Tuning Utility (PTU) might be right up your street.
If you used one ofour profilersyou would be free to write the games you wanted to profile?
Regards
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> Writing a profiler is quite an undertaking.
I'm writing simple profiler, it is targeted only on my CPU with Architectural Performance Counters and, at beginning, it will measure only simple parameters, like LLC and branch mispredictions.
Profilers, that you offer, cost money, but I have now not to much, to buy them. And besides - I have such a feeling, that I almost completed profiler. Plus my own profiler is more flexible - I can use it in any way and in any place. And it is very good work for education.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You will find quite a lot of useful information about setting up counters here http://software.intel.com/en-us/articles/intel-performance-counter-monitor/
There is also example driver source code and lots of other useful information too.
I hope this helps.
Regards
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,Steve Hughes.
Thank you, I've already read this article and hoped, that somebody more experienced in MSR programming can point me on my errors. So, now I'll more thoroughly study this driver.
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page