I'm evaluating the performance of IPP 7.0, which is giving us great improvements over our current (Blitz) baseline.
However, I'm seeing some strange IPP vs IPP benchmarking results (see below). Win32 is performing (substantially) better than x64.
Setup: VS2010, Corei7 2.0 GHz, MSVC optimization options enabled. Times are in seconds, for 2000 iterations.
What might cause this? Are the IPP 7.0 routines not optimized for x64???
Conv. Type Main Filter TypeWIN32 x64 Routines
2D 190x190 9x9 float 0.87 1.61 ippiConvFull_32f_C1R
short0.62 1.58 ippiConvFull_16s_C1R
Separable 2x1D 190x190 2 9x1 float 0.23 0.33 ippiFilterColumn_32f_C1R
short0.18 0.48 ippiFilterColumn_16u_C1R
Well, I had a long post written detailing how the timings were 3x-4x worse on x64 than Win32, but then I found the problem.
For Win32, I was linking with "default linking method".
For x64, I was linking with "single-threaded static library".
I changed x64 to "default linking method", and now it's essentially as fast as Win32.
For your copy test, you may want to look into _aligned_malloc(N, 32) (and _aligned_free), if you're not already doing so.
Thanks for the updates. Ifregarding the big topicwhich is faster between 32bit and 64bit, it may involve many discussions on hardware, cachesize, register , memory, problem size etc. Fromthe perspective of performance result IPPI, we don't assume the 64bitare alwaysfaster than32bit. Just say, some times, Ais gooder than B,some times B is good than B. But basically, they arealmostsame fast. Ifthere is bigdifference (i.e above 20%), thenit is valuable to investigatefurther.
IPP provide binarytest toolsfor most of IPPI functions and here is guide article for your reference:
Using the performance tool to measure Intel IPP Function performance