Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Windows vs. Linux performance

pvonkaenel
New Contributor III
948 Views

Hi,

We've started porting our video processing pipeline from Windows to Linux and we're seeing that many of the Linux IPP routines are slower than the Windows version.  In particular the resizers such asippiResizeFilter_8u_C1R() using theippResizeFilterLanczosfilter option.

General question: is it expected that the Linux IPP routines perform the same as the Windows equivalets?

Note that I've performed the timing test on two identical HW platforms which have dual Xeon X5680 3.33GHz CPUs.

Thanks,

Peter

0 Kudos
21 Replies
SergeyKostrov
Valued Contributor II
901 Views
>>... I've performed the timing test on two identical HW platforms which have dual Xeon X5680 3.33GHz CPUs. Could you post your data, please? What about a test-case? Thanks in advance.
0 Kudos
pvonkaenel
New Contributor III
901 Views
Hi Sergey, Thanks for responsing. I can work on distilling down an example which is based on our system code, but before I start that, do you expect there to be performance differences between the Linux and Windows versions of IPP routines? Thanks, Peter
0 Kudos
Bernard
Valued Contributor I
901 Views
There should be some differences between those two OSs , because of different architecture implementation.
0 Kudos
pvonkaenel
New Contributor III
901 Views
How much of a difference would you expect. I've just finished running some loop timings on the Lanczos resizer I mentioned earlier, and for Windows the loop runs in 3 minutes 36 seconds, and under Linux it's 4 minutes 2 seconds. This seems like a big difference to me. Regardless of the OS, don't they both have the same asm instructions available? Do the asm implentations differ between Linux and Windows, or do they share the same low level code? Thanks, Peter
0 Kudos
Chuck_De_Sylva
Beginner
901 Views
I would be great if we had a test case, as Sergey mentioned. It will help debugging if there are any issues.
0 Kudos
SergeyKostrov
Valued Contributor II
901 Views
>>... I've just finished running some loop timings on the Lanczos resizer I mentioned earlier, and for Windows the loop runs in >>3 minutes 36 seconds, and under Linux it's 4 minutes 2 seconds. This seems like a big difference to me... I wouldn't expect absolutely identical numbers and some difference in times could be contributed by: - different C/C++ compilers - different optimization options selected for C/C++ compilers - significant differences in OSs ( as Iliya mentioned ) - different workload of OSs ( services, network support, etc ) So, there are many things that affect performance on both platforms. What compilers did you use? I have lots of test cases and tests compiled with MS or Intel C/C++ compilers for Windows almost always outperform tests compiled with MinGW C/C++ compiler for Windows. Many tests compiled with a legacy Borland C/C++ compiler ( 15+ year-old technology ) outperform all modern C/C++ compilers (!) mentioned above. In your case a test on Linux is ~10% slower than a test on Windows and in overall it matches to my numbers.
0 Kudos
Bernard
Valued Contributor I
901 Views
>>>How much of a difference would you expect. I've just finished running some loop timings on the Lanczos resizer I mentioned earlier, >>> It is hard to say exactly how much of a difference you can excpect.Such a difference could be described as function of many variables which are tightly coupled to Linux internal architecture.
0 Kudos
pvonkaenel
New Contributor III
901 Views
Thanks for the feedback. I'm currently working on putting together an isolated sample which demonstrates what I'm seeing. In the meantime, I can answer some of the questions. Under Windows I'm using the Visual Studio 2010 C++ compiler with default release build optimization settings. Under Linux we're using gcc with -O2. However, I would not expect the compiler to affect the speed much since a majority of the work is being performed within the IPP calls which should resolve to hand coded asm (correct?). In both test cases there are 24 logical cores available, and I've made sure to have a minimal load from other processes. Only core OS services should be running beside the test. Peter
0 Kudos
SergeyKostrov
Valued Contributor II
901 Views
>>...I would not expect the compiler to affect the speed much since a majority of the work is being performed within the IPP calls which >>should resolve to hand coded asm (correct?). Yes, that is correct. Here are a couple of more comments: - Try to boost a priority of your application to 'High' or 'Real-Time' on both platforms in order to preempt as many as possible processes and threads existing / working at the same time. I always do this when measuring performance of some piece(s) of codes. - Try to set ( force ) a process / thread affinity mask to one CPU Note: For these two cases I could provide two small examples for Windows but for Linux you'll need to understand how to do the same - Try to use the same -O2 optimization option with Visual Studio 2010
0 Kudos
SergeyKostrov
Valued Contributor II
901 Views
By the way, do you have the same BIOS settings on both computers? The most important settings are as follows: - Intel Hyper-Threading Technology - Intel TurboBoost Technology - Intel SpeedStep Technology
0 Kudos
Bernard
Valued Contributor I
901 Views
>>> Try to boost a priority of your application to 'High' or 'Real-Time' on both platforms in order to preempt as many as possible processes and threads existing / working at the same time. I always do this when measuring performance of some piece(s) of codes.>>> Exact implementation of scheduler and dispatcher on Linux platform could differ from Windows OS.Moreover kernel code activity(mostly interrupt)driven drivers could also affect fine grain time measurement on Linux platform. So I think that you can not directly compare those two OSs.Even a few dozens of different asm instruction(directly compiled from the kernel source)which are not used on Win OS components and those instructions could pollute the results of such a comparision. Here you can see the exact comparision between thos two OSs :http://widefox.pbworks.com/w/page/8042290/Architecture For example please look at scheduler latency results you can see that Linux scheduler has much lower latency that its Windows counterpart.
0 Kudos
pvonkaenel
New Contributor III
901 Views
Thanks for all the input. At this point I think I'll need a better understanding of Linux and make sure I'm making a fair comparison, and then try different system level tricks on Linux to match my Windows performance. I do, however, think you've answered my main question - yes we should expect to see IPP performance difference between Windows and Linux. Thanks for the help, Peter
0 Kudos
SergeyKostrov
Valued Contributor II
901 Views
Here are my notes: >>...Linux scheduler has much lower latency that its Windows counterpart... A scheduling engine for family of Windows NT based OSs was designed by one of the best expert in multi-processing from VAX in 1990th. That design was initially introduced in first versions of Windows NT ( 1.x, 2.x, 3.x, or so ). Take a look at: http://widefox.pbworks.com/w/page/8042322/Scheduler >>Kernel Comparison: Linux (2.6.28) versus Windows (Vista SP1) >>... >>... >>Timeslice - Multiprocessor >>Scheduler - Multiprocessor (timeslice) Linux Windows >>timeslice - range 10ms-200ms' 15ms-180ms (Client) >>180ms (Server') >>... >>... >>timeslice - default 100ms' 30ms, 60ms, 90ms (Client) >>180ms (Server)' >>... >>... >>Performance >>Scheduler (performance) Linux Windows >>scheduling latency (average) 0.009mS' 2mS10' >>scheduling latency (worse) 0.3mS' 16mS10' >>... - The report favours Linux by default but I don't defend Windows - The report compares latest version of kernel for Linux with some older release(s) of Windows: ... Compared Version: Linux Q1 2009 vs Windows Q1 2008 Initial Release: Q4 2008 vs Windows Q1 2007 Latest Release: Q3 2011 vs Windows Q1 2011 ... - Latencies could NOT be deterministic. One of IDZ users tested my test-case on his computer with a latest Intel CPU and reported that a switch from thread A to thread B was completed in 38 nanoseconds. Personally, I wouldn't worry about performance differences on different OSs if numbers for some test(s) differ by less than ~10%.
0 Kudos
SergeyKostrov
Valued Contributor II
901 Views
>>...One of IDZ users tested my test-case on his computer with a latest Intel CPU and reported that a switch from thread A to thread B >>was completed in 38 nanoseconds Forum topic: Synchronizing Time Stamp Counter Web-link: http://software.intel.com/en-us/forums/topic/332570
0 Kudos
Bernard
Valued Contributor I
901 Views
>>>A scheduling engine for family of Windows NT based OSs was designed by one of the best expert in multi-processing from VAX in 1990th.>>> Was it Dave Cutler? >>>- The report compares latest version of kernel for Linux with some older release(s) of Windows:>>> You are right I have forgotten to mention it in my post.
0 Kudos
Chuck_De_Sylva
Beginner
901 Views
Another thing you can do is wrap timer calls around just the IPP code to narrow it down in both the Linux and Windows cases.
0 Kudos
SergeyKostrov
Valued Contributor II
901 Views
>>>A scheduling engine for family of Windows NT based OSs was designed by one of the best expert in multi-processing >>from VAX in 1990th. >> >>Was it Dave Cutler? That is possible. I don't remember his name but I remember that I read it in a book about history of Microsoft ( possibly written by Bill Gates or somebody else from Microsoft ).
0 Kudos
SergeyKostrov
Valued Contributor II
901 Views
>>...Windows vs. Linux... By the way, in the middle of 1990th everybody was comparing Windows vs. OS/2 Wrap. Does anybody remember it? Unfortunately, OS/2 Wrap has not survived. PS: Some time in June 1995, two months before a release of Windows 95, I had a very exciting dialog about Windows and OS/2 OSs with a very experienced system software developer and I will reproduce our conversation later...
0 Kudos
pvonkaenel
New Contributor III
901 Views
Chuck De Sylva (Intel) wrote:

Another thing you can do is wrap timer calls around just the IPP code to narrow it down in both the Linux and Windows cases.

That's how I tend to time IPP routines. I put a timer just around the call, run it in a long loop, and then get a moving average. One thing I've noticed (at least under Windows), is that if you have a lot of other things happening in between the IPP call, you can get some IPP slowdown. It's like the call needs to be warmed up and then kept active. I would guess this has to do with cache misses, but I've even seen this where the image content changes from call to call. Peter
0 Kudos
pvonkaenel
New Contributor III
801 Views
Sergey Kostrov wrote:

>>...Windows vs. Linux...

By the way, in the middle of 1990th everybody was comparing Windows vs. OS/2 Wrap. Does anybody remember it? Unfortunately, OS/2 Wrap has not survived.

PS: Some time in June 1995, two months before a release of Windows 95, I had a very exciting dialog about Windows and OS/2 OSs with a very experienced system software developer and I will reproduce our conversation later...

I really liked OS/2 (even back to version 1.3). While everyone else at work was using Windows for Workgroups or Solaris, I used OS/2 since I could use Windows apps and Hummmingbird to access the Spac stations. At the time there was a goos book about the inner workings of OS/2 which I forget the name of.
0 Kudos
Reply