I was using ippiFilterColumn_32_C1R earlier now currently upgrading with ipp2017. I have replaced ippiFilterColumn_32_C1R with legacy90ippiFilterColumn_32_C1R API's and result is not matching, while ippiFilterRow_32_C1R is working fine.
I have attached sample images, kernel value and output results.
Please provide me some workaround to fix this issue and what is the reason for mismatch.
ippiFilterColumn functionality is supported via ippiFilterBorder function in the latest IPP releases - just use your kernel with width=1 - and optimized code branch for column processing will be used. It is better to switch to the latest IPP releases as they support the modern Intel architectures, while the legacy libraries don't.
Currently we need to use legacy only as we want backward compatibility with bit match.
I have tried replacing with ippiFilterBorder API and result is mismatching.
Please look into legacy API as backward compatibility is priority here
bit-to-bit for floating point functions is very hard to achieve. Internally IPP code has several branches in dependence on kernel size, data alignment and image size. Also this code is different for different architectures - ia32/Intel64, SSE2, SSSE3,...,AVX512 and can have differences between different operating systems (Windows, Linux,MAC OS X). With which IPP version did you perform comparison? (I mean - you've compared legacy with some old IPP version - which one?)
We are running on windows machine(with win-7). As an example ippiFilterRow API gives bit to bit floating point match, while in the same process IppiFilterColumn doesn't. All the input like image, kernel & all other parameter are same.
1) please insert several lines of code in your app in order to be sure that the same optimization path works in both libraries:
const IppLibraryVersion *lib;
lib = ippiGetLibVersion();
printf( "CPU : %s\n", lib->targetCpu );
printf( "Name : %s\n", lib->Name );
printf( "Version : %s\n", lib->Version );
printf( "Build date: %s\n", lib->BuildDate );
- the main difference between IPP 7.0.7 and 8.2.legacy is that in 8.2 we added AVX2 support; if you run your app on the AVX2-machine (or higher) - for 7.0.7 e9/g9 code is dispatched (AVX - the top to that moment supported), while for 8.2.legacy - h9/l9. It can be one of the reasons.
2) there is no any difference in code of e9/g9 optimization for ippiFilterColumn_32f_C1R between IPP 7.0.7 and 8.2 legacy, but these 2 releases were built by different compiler versions (7.0.7 - by CompilerXE 12.1, while 8.2.legacy - by 14.0.2). We have to switch to new compiler versions in order to support new architectures and instruction sets. As code of this functions was developed with intrinsics - different compilers may lead to the different cpu instructions order (the same algorithm and logic with slightly different order of calculations; for AVX2 code fma instruction can be used, that leads to the different intermediate rounding in comparison with the mul/add pair).
3) IPP library doesn't provide (and doesn't claim for) bit-to-bit equal FP results for different code branches (different optimizations - SSE2, SSSE3, SSE42, AVX, AVX2, etc.; different input/output data alignment, etc.). Equality for FP functions should be checked with some epsilon that depends on the number of FP operations per point/pixel. For example the weight of the least meaning bit for 32f data type (normalized) is 1.19209289e-07f and this epsilon can't be better than 1.19209289e-07f *0.5 * N, where N is a number of FP operations per pixel.
1) please make sure that the same optimized code path works in both cases. If not - it can be switched with ippSetCpuFeatures() function.
2) if the #1 doesn't solve your issue - see #2 and #3 above - in this case we can't help as this is not IPP bug and is a feature of almost all FP code.