IPP 7.0 2D Convolution Benchmarks - Win32 better than x64??

drcdr · ‎10-20-2011

Hello,

I'm evaluating the performance of IPP 7.0, which is giving us great improvements over our current (Blitz) baseline.

However, I'm seeing some strange IPP vs IPP benchmarking results (see below). Win32 is performing (substantially) better than x64.

Setup: VS2010, Corei7 2.0 GHz, MSVC optimization options enabled. Times are in seconds, for 2000 iterations.

What might cause this? Are the IPP 7.0 routines not optimized for x64???

Thanks,

- Chris

================================================================================

Conv. Type Main Filter TypeWIN32 x64 Routines

2D 190x190 9x9 float 0.87 1.61 ippiConvFull_32f_C1R

short0.62 1.58 ippiConvFull_16s_C1R
Separable 2x1D 190x190 2 9x1 float 0.23 0.33 ippiFilterColumn_32f_C1R

ippiFilterRow_32f_C1R

short0.18 0.48 ippiFilterColumn_16u_C1R

ippiFilterRow_16u_C1R

jon_shadforth · ‎10-21-2011

Hi Chris - I've been involved in some similar testing recently. I've just ran our timing test application which includes testing the 8U full convolution function (ippiConvFull_8u_C1R). My results were almost identical between 32-bit and 64-bit (same machine, dual-boot). (640*480 and 9*9 input images, 6.094ms/call on 32-bit, 6.152ms/call on 64-bit).

But, some of the other tests show huge variations.

For example, one of our copy tests (usingippiCopy_8u_C1R) ran at 0.027ms/copy on 32-bit, but 0.046ms/copy on 64-bit. I have similar results for some thresholding functions.

Most of the tests are fairly consistent. In terms of IPPI calls only I don't see any 64-bit functions that seem to have the reverse behaviour (i.e. twice as fast as their 32-bit counterparts).

I wrote these originally because I'd presumed that my 64-bit build would run faster, but so far it's not the case.

(Environment: Studio 2010, .Net/C#, static-linking to IPPI 6.5).

So I'd also be interested to know whether Intel would expect specific variations.

Jon.

drcdr · ‎10-26-2011

Thanks for the feedback Jon.

Well, I had a long post written detailing how the timings were 3x-4x worse on x64 than Win32, but then I found the problem.
For Win32, I was linking with "default linking method".
For x64, I was linking with "single-threaded static library".

I changed x64 to "default linking method", and now it's essentially as fast as Win32.

For your copy test, you may want to look into _aligned_malloc(N, 32) (and _aligned_free), if you're not already doing so.

- Chris

Ying_H_Intel · ‎10-27-2011

Hi Chris andJon,

Thanks for the updates. Ifregarding the big topicwhich is faster between 32bit and 64bit, it may involve many discussions on hardware, cachesize, register , memory, problem size etc. Fromthe perspective of performance result IPPI, we don't assume the 64bitare alwaysfaster than32bit. Just say, some times, Ais gooder than B,some times B is good than B. But basically, they arealmostsame fast. Ifthere is bigdifference (i.e above 20%), thenit is valuable to investigatefurther.

IPP provide binarytest toolsfor most of IPPI functions and here is guide article for your reference:
Using the performance tool to measure Intel IPP Function performance

Best Regards,
Ying