Hi. I think there should be something that is not working correctly in the ippsDecodeLZ4_8u() function. For example, the next code work flawlessly:
#ifdef HAVE_IPP int outlen; int inlen = (int)compressed_length; IppStatus status; //status = ippsDecodeLZ4_8u((const Ipp8u*)input, inlen, (Ipp8u*)output, &outlen); status = ippsDecodeLZ4Dict_8u((const Ipp8u*)input, &inlen, (Ipp8u*)output, 0, &outlen, NULL, 1 << 16); cbytes = (status == ippStsNoErr) ? inlen : -inlen; #else cbytes = LZ4_decompress_fast(input, output, (int)maxout); #endif
However, if I comment out the ippsDecodeLZ4Dict_8u() line and uncomment the one with ippsDecodeLZ4_8u() the code only works in debug mode (-O0 -g), not in Release mode (-O3 -g). I am using IPP 2019.1.144 on a Mac with OSX 10.4.2 and clang 1000.11.45.5 (Apple LLVM version 10.0.0).
Any hint will be appreciated.
Thank you for notification. On one hand, binary function code should not care about how the calling function is compiled, but on the other hand...we will check.
What is the issue? Improper decoding, or application crash?
Hi. While producing a minimal self-contained version of the code for your inspection, I realized that the issue was that I was not initializing the `outlen` parameter to the size of the `output` buffer. After doing this, eveything goes fine here.
I found this thread and thought I'd share my findings for posterity. I encountered a crash in ippsDecodeLZ4_8u, but which was caused by something else.
I'm working on an application that compresses and stores multiple data packets about 10MB each to disk in parallel using IPP's LZ4 compression (non-HC for speed). I'm using TBB pipelines for parallelizing compression and disk storage, and each pipeline is ran in parallel using a TBB task group.
In an effort to optimize resource usage to the extreme, I thought about sharing one hash table across all running calls to ippsEncodeLZ4_8u. That seemed to work fine with ippsEncodeLZ4HC_8u (probably due to its slower speed, but in retrospect I think that it could have crashed too), but with the non-HC version it caused ippsDecodeLZ4_8u to crash while decompressing some data packets that apparently were corrupted by ippsEncodeLZ4_8u.
So in short the solution I wanted to share is: Use one separate hash table per thread calling ippsEncodeLZ4_8u.
You are perfectly right! Hash tables can't be shared between different threads for data-parallel processing. It's because of nature of hash use in data compression. Dictionary methods of data compression (LZ4 is one of them) are based on elimination of redundancy in source data by finding the repeating sequences of bytes and substituting them with some instructions, like "copy N bytes at offset M from current output pointer, those N bytes are the part of already decompressed part of data".
Hash tables here are hints, or pointers to candidate matches. The compression function while processing tries to find the equivalent substrings in the history of already processed data. The hash tables can be arrays of pointers to some positions in source data buffer, or indexes of these positions. The particular hash table is coupled to the particular source data buffer with its specific content and addresses. When source data buffer is processed, the hash table is constantly changed according to processed part of buffer.
In multi-thread environment if hash table is common to all threads, the threads will be overwriting the content of hash table elements with their own (thread and source data buffer-specific) content. Content of specific hash table element once it is written is relevant to thread X but has no sense to other threads. Besides the fact that common hash table is unusable for team of threads as a whole, the compression result is unpredictable, because for different runs the threads are executed in different order. Who knows which thread will update the particular hash element first.