Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Strange behaviour of Intel Xeon E3-1220v2

Tommy_F_
Beginner
2,667 Views

Hi All

I have bought Intel Xeon E3-1220 v2 processor.

I am testing Linux on this processor. I created a test to see the time it takes for the OS to handle the clock tick.

My test is as following: I create a thread with highest priority ( can be preempted only by interrupts), whose task is just to do a small calculation in a loop of 45000 times. Here is the code:

long LoopCounter = 45000; 

while ( lLoopCounter > 0 )

   {

   Value =  (( Value * 3 ) + 5);

      lLoopCounter--;

   }

This task takes 102 µs to be finished.

If a clock tick happens while the task is executing, then this task is interrupted for some time until the interrupt is handled. ( aka: in normal case, the task takes 112 µs and if a tick happens, then the task takes 120 µs, which means that 8 µs are spent by the OS to handle the clock tick interrupt). I did this procedure around 100 times and I saw that each 1 ms I got a clock interrupt as the OS clock tick is running at 1000 Hz. so the normal values should be 102 µs if no tick happens, and 110µs if a clock tick happens.

This was perfect test on Intel core i7-930 @2.8GHz.

If I do the test on processor Xeon E3-1220 v2 @ 3.1 GHz, the task takes shorter time ( 102 µs) as it is faster processor. it takes 110 µs if a tick happens.

The strange thing I see on this processor that sometimes I get values around 120 µs, which means that some kind of interrupt happened and preempted the task for 18 µs. If it was clock tick, then it should take 110 µs.

so the processor is generating some kind of interrupts.

If I do the test with loop of 5000 instead of 45000, I do not see anymore such strange values.

 

Synthesis: if I do a task which is looping 45000 times on intel xeon, I start to get strange interrupts, which is not the case on i7-930.

If I do the test with loop of 5000, the interrupts disappear.

what could be the issue?

 

Regards

0 Kudos
31 Replies
Patrick_F_Intel1
Employee
2,036 Views

You can read /proc/interrupts and look to see, for the cpu you are running on, which interrupts occurred in the interval. And set the affinity of the thread to make sure it doesn't get moved to a different cpu.

That would be a good place to start. But be warned, trying to account for every microsecond on a modern machine can be a huge time sink. The OS & hardware does all kind of stuff that one doesn't usually notice (like kernel nmi_watchdog). And the Intel folks don't generally (well... at least I don't) have a lot of time to answer general curiousity questions.

Pat

0 Kudos
Tommy_F_
Beginner
2,036 Views

Hello Patrick

Thanks for yoru reply.

I already checked the interrupts and i just have 2 sources of interrupts ( timer and i8042).

I am also using the affinity. I even disabled the other cores and just kept one core running and nothing changed.

for getting the time, i am using the rdtsc() function, whose overhead is very limited.

The same kernel and same source code were used on intel core i7-930, and the results were perfect.

 

Regards

0 Kudos
Patrick_F_Intel1
Employee
2,036 Views

Does the number of interrupts in /proc/interrupts during the interval match your code's estimate of the number of interrupts?

0 Kudos
Tommy_F_
Beginner
2,036 Views

the thing that i can be sure of is that only the (timer and i8042) values are changing. No other interrupts are changed each time i execue cat /proc/interrups.

My program runs around 10 seconds, which means at least 10 000 ticks, and a tick is happening each 1 ms, so it will be difficult to find a good estimation.

The strange thing is why with large loop (45000) we see such interrupts while with value 5000 we do not see them. we thought it could be some managemnt interrupt by the proceossor ot due to L0 cache...

0 Kudos
Patrick_F_Intel1
Employee
2,036 Views

The OS also doesn't just do the same stuff in every interrupt. Linux can queue up interrupt work and do different stuff during an interrupt. Also Linux can (in more recent versions) coalesce interrupts (another way of combining them) but I'm not very familiar with this mechanism. Also it is possible that the assembly code generated for the 45000 iterations is different that the shorter loop (just a guess but possible). If the longer loop uses XMM registers, the the OS can do a trap on the first XMM instruction and then the OS sets up XMM save/restore mechanism for context switches... and the possibilities go on, and on, and on.

0 Kudos
Tommy_F_
Beginner
2,036 Views

with what i see in my results, Linux is handling its interrupt each 1 ms as expected. but during the 1 ms, several interrupts are happening.

Attached a word document with some images and expalantions. Maybe they can help a bit.

0 Kudos
Patrick_F_Intel1
Employee
2,036 Views

Do you have turbo mode disabled? Just to take that out of the picture...

0 Kudos
Tommy_F_
Beginner
2,036 Views

yes, turbo mode is disabled

0 Kudos
Patrick_F_Intel1
Employee
2,036 Views

Here is a paper on OS jitter that looks interesting and (hopefully) relevant. Or google "Exploratory Study on the Linux OS Jitter" and load the pdf.

http://www.google.com/url?sa=t&rct=j&q=last%20level%20cache%20jitter&source=web&cd=2&cad=rja&ved=0CDEQFjAB&url=http%3A%2F%2Fsbesc.lisha.ufsc.br%2Fsbesc2012%2Ftiki-download_file.php%3FfileId%3D106&ei=0ueSUa64HunqiwKdTA&usg=AFQjCNGgUAOw4mhlBKuHXvmQpmeCISYLXw&bvm=bv.46471029,d.cGE

If I were trying to figure out what is going on, I'd probably try running linux tracing. I use trace-cmd (front-end to ftrace) with a command like below. Hopefully the tracing doesn't screw up the behavior but you should see each interrupt, each timer, each context switch, etc.

trace-cmd record -s 1000 -b 10000 -e all -o trace_1.dat your_cmd_to_run

./trace-cmd report -i trace_1.dat > tmp_1.txt

You can also insert markers into by writing a string to /sys/kernel/debug/tracing/trace_marker . This lets you indicate say, the start and stop of your loop so you can narrow the trace down.

Pat

0 Kudos
SergeyKostrov
Valued Contributor II
2,036 Views
>>...If a clock tick happens while the task is executing, then this task is interrupted for some time until the interrupt is handled... Tommy, Could you try to add a critical section into your codes? What happens in that case? Also, there is an excellent article on Intel website about measuring precise time intervals on a Linux platform. Unfortunately, I even don't remember the article's name but I know it exists. I remember that an author used some Linux function to disable interrupts in order to make his tests as deterministic as possible. If you have a test case why wouldn't you upload it?
0 Kudos
Tommy_F_
Beginner
2,036 Views

@Sergey

The test case is described in a previous upload during the discussion. the results are obtained in ticks, as we are using the rdtsc() function which counts the number of ticks spent doing the task, and then we have another software that converts the results and show them as images.Our aim of this test is not to have deterministic results. we do such test to see the behaviour and performance of an OS on a specific platform. so for us, Linux on Xeon is behaving strangely due to the fact that some interrupts can happen in the system which may effect the predictability of this OS. The question was why "if the processor is doing a task 45000 times, such unexpected interrupts happen, and why with lower loop time (5000) the strange interrupts disappear.

The tracing function is accurate and stable, because we have already tested Linux on Intel i7, ATOM, Pentium, but none of these processors showed the strange interrupts.  we also tested QNX, Windows and Android using the same test case. so our test case is not OS dependent

Regards

0 Kudos
Bernard
Valued Contributor I
2,036 Views

>>>OS clock tick is running a 1000hz>>>

for precise measurement of clock frequency you can use Clockres tool.Usually OS has more than one timer.One of them is accessed by QueryPerformancCounter/QueryPerformanceFrequency functions pair and second is accessed by GetTickCount and its kernel counterpart.

0 Kudos
Bernard
Valued Contributor I
2,036 Views

Tommy what OS do you have?

If it is Windows for ISR and DPC routines you can use Xperf tool and for total time spent in kernel mode you can use kernrate.

>>>I already checked the interrupts and i just have 2 sources of interrupts ( timer and i8042)>>>

Do you mean interrupt which signalled its state to cpu?

0 Kudos
Bernard
Valued Contributor I
2,036 Views

>>>The OS also doesn't just do the same stuff in every interrupt. Linux can queue up interrupt work and do different stuff during an interrupt. Also Linux can (in more recent versions) coalesce interrupts (another way of combining them) but I'm not very familiar with this mechanism>>>

Also Windows mainly its recent version(7 and Server2008) can coalesce timers.When the timer's firing intervals are combined thus to reduce the interrupt handling overhead.

0 Kudos
Tommy_F_
Beginner
2,036 Views

@ iliyapolak

I am using RT-Linux. such version of Linux has the minimal components as it is used for real-time purposes.

I am getting the accurate measuremnt of the clock tick, so it is not the problem. The problem is that i am seeing other interrupts other than the clock tick interrupt. so each 1 ms i am seeing the clock tick interrupt, which is the normaml behaviour of RTOS. But at some moments, I am getting other type of Interrupts.

My test case is as follow:

start timing

do a calculation for 45000 time

stop timing

so I see that the work done requires 102 µs, and each 1 ms i see that the work takes 110 µs. this is because a clock interrupt happened and postponed the work 8µs. so i got what i wanted to test, which is the OS need 8 µs to handle its interrupt.

the strange thing is that at some moments i get values around 120 µs. so my task is interrupted by some kind of interrupt that postponed my work for 18 µs.

0 Kudos
Bernard
Valued Contributor I
2,036 Views

Hi Tommy,

what is the priority of your task?

I assume that here is similarity between Windows and Linux in giving more priority to hardware interrupts.

0 Kudos
Tommy_F_
Beginner
2,036 Views

I am using the highest priority of Linux. I used this to make sure that the only thing that can interrupt my task are interrupts, and not any other thread. Anyway, my program for this test is just one thread that is doing the calculation task.

0 Kudos
Bernard
Valued Contributor I
2,036 Views

 

I am talking about the other interrupts not about other tasks.Is there any system tool which can collect and show total cpu load and/or load breakdown by some running software(tasks,interrupt service routines and etc...)

Do you have any polling enabled? Is there any spurious interrupts on your system?

0 Kudos
Tommy_F_
Beginner
2,036 Views

looking at the cat /proc/interrupts shows what kind of interrupts happened during the system run-time. this showed that the clock tick interrupts are the only interrupts happening in the system.

0 Kudos
Tommy_F_
Beginner
1,901 Views

using the cat /proc/interrupts tool in Linux, i can see what kind of interrupts happen during the system run time. i see that the clock tick timer is the only changing value.

0 Kudos
Reply