Processors
Intel® Processors, Tools, and Utilities
14575 Discussions

Intel Processor Trace - PTWRITE instruction uses too many cycles

jerry15
Beginner
597 Views

Hi, I've encountered some issues while utilizing the PTWRITE instruction introduced with the 12th generation Intel Core processors. I'm hoping experts in this field can provide some guidance.

 

From my understanding, the PTWRITE instruction is intended to write values from registers into the Intel PT buffer with minimal cycle overhead. I've been using the instruction in the form of `PTWRITE %rax` or `PTWRITE %eax`. However, in practical testing, I've found that its overhead is not as low as advertised.

 

I've written a loop in assembly to execute the `PTWRITE %eax` instruction (writting a 32-bit register). When testing on the Performance core using `perf stat -C 4 -e cycles,instructions`, it averages around 6.25 cycles per instruction. The cycle count is even longer for `PTWRITE %rax` (writting a 64-bit register), reaching 10 cycles per instruction. Additionally, if I utilize the FUP feature of Intel PT to output IP addresses simultaneously with PTWRITE, the overhead becomes even greater.

 

However, when testing the same data on the Efficient core, regardless of whether outputting with a 32-bit %eax or 64-bit %rax, the instruction only takes about 1 cycle. Yet, the internal buffer of the Efficient core for Intel PT seems prone to overflow. It appears to have a similar export rate to the internal buffer of the Performance core, resulting in significant overflow in the exported data if my throughput exceeds its export rate.

 

I would like to inquire whether these phenomena align with the design expectations, or if I may have missed any configurations that could aid in reducing overhead. The fact that, for the Performance core, the cycle cost of PTWRITE is even longer than that of several memory access instructions, it raises doubts in my mind regarding the design purpose of PTWRITE.

 

I've followed the guidance in the Intel programming manual to enable Intel PT by setting the `TRACEEN` and `PTW_EN` bits in the `MSR_IA32_RTIT_CTL` register. I'm using a contiguous block (4GB) of physical memory as the PT circular output buffer, so any influence from TOPA can be ruled out.

 

The CPU models I'm working with are i5 12500, i9 13900ks, and i9 14900k. The described issues exhibit almost identical behavior across these processors. The i5 12500 doesn't distinguish between performance and efficiency cores, performing similarly to the i9's performance cores. Among these processors, the i9 models have a memory frequency of up to 5600MHz.


Thanks so much.

Labels (1)
0 Kudos
3 Replies
RamyerM_Intel
Moderator
367 Views

Hello jerry15, 


Thank you for posting in the communities. Upon reading your post, we want to let you know that PTWRITE in a loop or many consecutive PTWRITE instructions is not an expected use model without slowdowns. PTWRITE in a loop can flood the internal PT buffers, quickly leading to a situation where the core may be trying to slow down a little to avoid the PT buffer overflow. 


After some cycles timeout, hardware issues an overflow packet and resumes without any slowdowns. PTWRITE in a loop can cause this behavior over and over again. We highly recommend for you to profile PTWRITE in a real workload-type situation, i.e., PTWRITE followed by some instructions, conditional branches, etc.


I hope this is helpful to you. If you have further questions, feel free to let us know and we will answer them for you. 


Ramyer M.

Intel Customer Support Technician 



0 Kudos
RamyerM_Intel
Moderator
281 Views

Hello jerry15, 

 

I am just checking in if you have further questions. Feel free to post it in this thread, and we will be there to assist you.

 

Ramyer M.
Intel Customer Support Technician 

0 Kudos
RamyerM_Intel
Moderator
162 Views

Hello jerry15, 


I hope you are doing well. As we have not heard a response in the past few days, we will proceed in closing this thread. If you need any additional information, please submit a new question as this thread will no longer be monitored.


We wish you the best in your future endeavors! 


Ramyer M.

Intel Customer Support Technician 



0 Kudos
Reply