Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

VTune: Increase sampling frequency to profile Linux Kernel Driver

el_kihel__tarek
Beginner
539 Views

I am trying to profile a Linux kernel driver and I would like to find out how much CPUcycles/time spent in certain functions. My goal is not to identify the Hotspots in the system but to profile a particular driver function. I have loaded the kernel debug symbols as explained in this article. I can see Kernel functions under [vmlinux] but I can't see the functions that I'm interested in. I think the reason for that is my functions are much faster than the sampling rate. To prove this theory I forced the driver to execute very long operations and the functions started to show in the report but this is not the condition I would like to test.

Is there anyway to increase the sampling rate and/or to limit the trace to particular functions?

The system I'm using:

  • VTune Amplifier XE 2015
  • Distribution: Fedora 21
  • Kernel: 3.19.3-200.fc21.x86_64

Thanks in advance.

-Tarek

0 Kudos
15 Replies
David_A_Intel1
Employee
539 Views

Hi Tarek:

Short answer is, yes!  See the "sample-after value" for hardware events.

Lowering the sample-after value will increase the sample rate.  Just be warned, lowering the sample-after value too much can have negative effects on system performance.  So, do it incrementally.  For example, if the current value is 290000000, start by removing a zero and collecting data.  Also realize that increasing the sampling rate is going to increase the amount of data collected.  If you are collecting for a very short amount of time, you shouldn't have any problems.  (Reminder, there is a limit on amount of data that is collected.  It is configurable, but increasing it will increase the length of time it takes to process the collected data.  Change the limit cautiously.)

0 Kudos
Bernard
Valued Contributor I
539 Views

I am not sure how much will it be an issue, but increasing sampling frequency will probably increase the amount of CPU cycles dedicated to servicing clock signal interrupts.

0 Kudos
el_kihel__tarek
Beginner
539 Views

Thanks MrAnderson for your reply. I'm using the Advanced Hotspots analysis to profile the Kernel driver. In the Advanced Hotspots menu I can not change the Sample After value. It's set to 30000 and editing this field is not allowed.  However there is a field for CPU sampling interval but that is in milliseconds and the lowest value it can accept is 0.01 ms. I don't think this value is fast enough to capture event's in microseconds!

Do I need to use a different Analysis type or is there a way to force a lower Sample After value?

Thanks,

-Tarek

 

 

0 Kudos
Bernard
Valued Contributor I
539 Views

So you can insert some lenghty operations inside profiled function which will run at least 0.02 miliseconds.

0 Kudos
el_kihel__tarek
Beginner
539 Views

All the functions that I want to profile runs in sub-microseconds time frame. Can the Vtune tool go down to that level in the kernel ?

0 Kudos
Bernard
Valued Contributor I
539 Views

I suppose then you will need to use RDTSC machine code instruction in order to measure the performance of very fast block of code.

0 Kudos
David_A_Intel1
Employee
539 Views

Hi Tarek:

I'm sorry.  You need to create a Custom Analysis based on the Advanced Hotspots type, then you can edit the values.  Right-click on Advanced-Hotspots in the tree on the left and select "Copy from current", then in the Custom Analysis dialog, double-click on the value to select it and change it.

0 Kudos
Bernard
Valued Contributor I
539 Views

@MrAnderson

What is the highest possible resolution of sampling rate or period in VTune?

0 Kudos
David_A_Intel1
Employee
539 Views

@iliyapolak,

Good question!  I don't find it explicitly documented, but you will notice that when you change the "CPU sampling interval" in the Advanced Hotspots configuration from 1 ms to 0.1 ms, the sample after values are automatically adjusted by dividing by 10.  Thus, my sample after values went from 1,900,000 to 190,000 when I set the sampling interval to 0.1 ms.  Then, if I set it to 0.01 ms, the sample after value changes to 19,000.  And, at that point, lowering the interval does not change the sample after value, thus, limiting the sampling interval to 0.01 ms, or 10 us.  Now, I don't know if the lower limit is system-dependent or not.  As a general rule, though, we recommend collecting 1000 samples per second.  This strikes a good balance between collecting a representative data set of the application with minimal overhead.  Exceeding that sampling rate tends to start impacting system and application performance.  You really don't want the act of measuring performance to change the performance.

I know that VTune Amplifier now limits the sample after value so that a user doesn't set it so low that the system locks up (which happened in the past when we didn't limit it :(.

For Basic Hotspots, the sampling interval is controlled by a spinner control and you can only set it as low as 1 ms (default is 10 ms).

0 Kudos
Bernard
Valued Contributor I
539 Views

Thanks for your response.

Now regarding the lowest possible resolution value of  10 us I suppose that this is based on HPET clock signal somehow averaged to 10 us period. Lower resolution can probably be obtained by dividing 10 000 000 samples by some sliding value like 10,100,1000.

0 Kudos
Bernard
Valued Contributor I
539 Views

Until now I thought that highest possible time resolution of Vtune sampling is around ~1-2 miliseconds and it is bound to Clock Interrupt.

0 Kudos
Peter_W_Intel
Employee
539 Views

iliyapolak wrote:

Until now I thought that highest possible time resolution of Vtune sampling is around ~1-2 miliseconds and it is bound to Clock Interrupt.

Please consider of using sample interval, on two situations:

1. Basic hotspots (user model level), default sample interval - 10ms (predefined), you can change it by using (for example):

amplxe-cl -collect-with runss -knob sampling-interval=20 -knob cpu-samples-mode=stack -- program 

Note: a. possible values(sampling-interval): numbers between 1 and 1000; b. refer this article to know usage in detail

That is, use user-defined hotspots.

2. Advanced hotspots (system mode level), default sample interval - 1ms (if you don't change SAV, VTune will set SAV automatically - CPU frequency / 1000, 1000 samples per second), you can adjust SAV if you want to collect more samples or less samples:

Please refer to this article.

Note: whatever samples you expect to collect, minimum SAV value is - 100000 !!! If you set SAV smaller than 100000, VTune will adjust it to 100000.

0 Kudos
Bernard
Valued Contributor I
539 Views

>>>Note: whatever samples you expect to collect, minimum SAV value is - 100000 !!! If you set SAV smaller than 100000, VTune will adjust it to 100000.>>>

If I understood it correctly there is no possiblity to increase period of sampling rate. I mean to generate sample with 1e+7 frequency.

0 Kudos
Peter_W_Intel
Employee
539 Views

iliyapolak wrote:

>>>Note: whatever samples you expect to collect, minimum SAV value is - 100000 !!! If you set SAV smaller than 100000, VTune will adjust it to 100000.>>>

If I understood it correctly there is no possiblity to increase period of sampling rate. I mean to generate sample with 1e+7 frequency.

You can increase sample rate, but it is NOT endless:-)

0 Kudos
Bernard
Valued Contributor I
539 Views

Ok , thanks for the answer.

0 Kudos
Reply