We are using the ippiDecodeExpGolombOne_H264_1u16s(...) method when parsing H.264 streams in our product. It seems that a recent update to Intel IPP 7.0 Update 4 breaks the behavior of this method on some CPUs.
We were able to see that, for example,the method reacts properly for on an Intel Xeon E5507 CPU, while the exact same code on the same stream returns a different value for both the Intel Core 2 X6800 and Quad Q6600 CPUs. The ippiDecodeExpGolombOne_H264_1u32s(...) method has a proper behavior for all CPUs.
Intel IPP libraries are linked statically using *_l.lib (single-threaded) version of the libraries. Libraries are from 7.0 Update 4, with the installer being w_ipp_188.8.131.52.exe.
I have a sample project (source is attached) in VS2010 clearly illustrating the differences when compiled in 'Debug' mode and then run on the different machines/CPUs (result of the method call is different for the same input stream). You can view attached snapshots for details on the output of this sample program on different CPUs.
We noticed that when this sample project is compiled in 'Release', it now works properly on all CPUs. However, calling the ippiDecodeExpGolombOne_H264_1u16s(...) method from our product compiled in 'Release' does not work. We suspect some linking/optimization flag during the build could explain this discrepancy with the 'sample project', but this is actually hard to identify.
Can you reproduce this issue? Do you have any additional information regarding what could happen? We will probably resort to using the *_32s(...) version of the method, but are wondering if the same issue could happen in some cases?
I can provide additional information if needed - let me know.
* BUMP *
We would definitely expect these libraries to display the same behavior no matter what CPU platform the code is running on.
By the way, the _32s version of the method performs correctly on all CPUs tested. We suspect some sort of 'overflow' issue on some platforms...