topic Re: ippsDecodeLZ4_8u (clarification) in Intel® Integrated Performance Primitives

ippsDecodeLZ4_8u (clarification)

axelriet — Sat, 17 Jan 2026 03:54:39 GMT

The documentation for ippsDecodeLZ4_8u is straightforward, but the sample code on the page increases the output buffer size by 33 bytes.

The requirement to add 33 bytes beyond what is necessary to hold the decompressed data is not documented for this parameter.

Are the 33 bytes mandatory? If yes, that makes the function (and therefore LZ4) unusable in many scenarios; for example, I decode into a buffer I don't allocate, so I don't control its size. If I have to allocate an intermediate buffer, copy the data, and free the buffer, that certainly negates any performance gain you may achieve by having those 33 bytes.

My concise question is: are the extra 33 bytes necessary?

If so, you should provide a new ippsDecodeLZ4Safe_8u function that never overruns the destination buffer defined by the decompressed data size.

Thanks,

Axel

Re: ippsDecodeLZ4_8u (clarification)

Chao_Y_Intel — Thu, 07 May 2026 06:27:36 GMT

Hello,

Thank you for reporting this issue. You are correct that the extra bytes are required. This is because internal SIMD optimizations in our implementation. This requirement is currently not documented in the API reference and it is a gap. We are tracking your feedback and will address the documentation in a future release.

thanks,
Chao

Re: ippsDecodeLZ4_8u (clarification)

axelriet — Thu, 07 May 2026 06:48:36 GMT

Thanks for clarifying but I guess my point is the function should never overrun the output buffer beyond the actual payload bytes.

Again the +33 bytes requirement makes the function unusable in many scenarios.

Allocating an intermediate buffer and copying the data is certain to negate whathever performance gain you get by having those 33 bytes.

Imagine you are decompressing 500 megabytes, how much performance do you gain by decompressing the last few bytes with normal instructions instead of one last SIMD instruction that sometimes overruns? Nothing.

If the memory happens to be at the end of a page the decompression function will fault, which is unacceptable, and again allocating extra bytes just to accommodate for this behavior is often not an option.

The fix is to make sure the function never overruns, not to document that it does.