Solved: LZO decompression function

trekker99 · ‎10-02-2009

I have been using LZO using IppLZO1XST method for a while now and it works great. However, I have noticed some unexpected behaviour with respect to decompression.

When I initially used ippsDecodeLZO_8u, I did not initialise dstLen. This worked fine until I upgraded from Intel Compiler Suite Pro (11.0 Build 074) to 11.1 Build 046. A few data sets under certain conditions which I have been unable to properly determine would cause my program to crash soon after decompression of these data sets. The problem went away when I initialise dstLen to some value (even 0 worked).

So my questions are these:

1) Is it important to initialise dstLen before callign ippsDecodeLZO_8u?
2) If it is not important, is there any reason why the above might have happened?
3) If it is important, should 0 work? I noticed that one of the status the decode function can return is ippStsDstSizeLessExpected. Shouldn't the function return this? As I have never seen this status error, under what conditions will ippsDecodeLZO_8u return this?

If you need anything from me (code, data sets), please let me know.

Thanks,

Edward

Sergey_K_Intel · ‎10-05-2009

Hi Edward,

For speed reason it was not planned that Decode function would be tracking the remaining dst length. But, it happened than in optimized (non-PX) code, some code pieces are common for Decode and DecodeSafe - where tracking is on - functions.

We will additionally split optimized code to avoid possible not initialized dstLen usage. Regarding ippStsDstSizeLessExpected, this error code is not used in LZO. We will update the documentation on this.
The onlyerror code saying that something is wrong in decoding is ippStsLzoBrokenStreamErr code, which is returned by DecodeSafe function when it sees, that either format of compressed data is not valid, or output data doesn't fit into output buffer. We also will update DecodeSafe's description in the doc.

Incurrent situation I would recommend you to initialize dstLen with the real length of output datat buffer you allocated.

P.S. We track the problem, that bronxzv is suffered from. It will be fixed in the ongoing release.

Best regards,
Sergey

View solution in original post

bronxzv · ‎10-02-2009

Quoting - trekker99

I have been using LZO using IppLZO1XST method for a while now and it works great. However, I have noticed some unexpected behaviour with respect to decompression.

When I initially used ippsDecodeLZO_8u, I did not initialise dstLen. This worked fine until I upgraded from Intel Compiler Suite Pro (11.0 Build 074) to 11.1 Build 046. A few data sets under certain conditions which I have been unable to properly determine would cause my program to crash soon after decompression of these data sets. The problem went away when I initialise dstLen to some value (even 0 worked).

So my questions are these:

1) Is it important to initialise dstLen before callign ippsDecodeLZO_8u?
2) If it is not important, is there any reason why the above might have happened?
3) If it is important, should 0 work? I noticed that one of the status the decode function can return is ippStsDstSizeLessExpected. Shouldn't the function return this? As I have never seen this status error, under what conditions will ippsDecodeLZO_8u return this?

If you need anything from me (code, data sets), please let me know.

Thanks,

Edward

See my topic here:

http://software.intel.com/en-us/forums/showthread.php?t=68265

btw I have alwaysset dstLen = 0 before to call ippsDecodeLZO_8u, but I still get corrupted data in roughly 5% of the tests with arrays with size > 100 KB

the bug isn't fixed in v11.1.46 according to my tests (exactly the same datasets trigger the bug) though they plan a fix for November as you can see in the other thread

trekker99 · ‎10-04-2009

Quoting - bronxzv

See my topic here:

http://software.intel.com/en-us/forums/showthread.php?t=68265

btw I have alwaysset dstLen = 0 before to call ippsDecodeLZO_8u, but I still get corrupted data in roughly 5% of the tests with arrays with size > 100 KB

the bug isn't fixed in v11.1.46 according to my tests (exactly the same datasets trigger the bug) though they plan a fix for November as you can see in the other thread

Hi bronzxv,

I have read through your topic, but I am not sure if it is the same problem. Most of my data sets are less than 100 KB and the ones I have trouble with are all 50 KB in size. Additionally, I have not encountered corrupted data, just a crash soon after I called ippsDecodeLZO_8u if dstLen was not initialized. If I set dstLen to some value (I have tried 0, 1 and the actual size), then it will work fine.

Thanks,

Edward

trekker99 · ‎10-04-2009

Quoting - bronxzv

See my topic here:

http://software.intel.com/en-us/forums/showthread.php?t=68265

btw I have alwaysset dstLen = 0 before to call ippsDecodeLZO_8u, but I still get corrupted data in roughly 5% of the tests with arrays with size > 100 KB

the bug isn't fixed in v11.1.46 according to my tests (exactly the same datasets trigger the bug) though they plan a fix for November as you can see in the other thread

Hi bronzxv,

I have read through your topic, but I am not sure if it is the same problem. Most of my data sets are less than 100 KB and the ones I have trouble with are all 50 KB in size. Additionally, I have not encountered corrupted data, just a crash soon after I called ippsDecodeLZO_8u if dstLen was not initialized. If I set dstLen to some value (I have tried 0, 1 and the actual size), then it will work fine.

Thanks,

Edward

Sergey_K_Intel · ‎10-05-2009

Hi Edward,

For speed reason it was not planned that Decode function would be tracking the remaining dst length. But, it happened than in optimized (non-PX) code, some code pieces are common for Decode and DecodeSafe - where tracking is on - functions.

We will additionally split optimized code to avoid possible not initialized dstLen usage. Regarding ippStsDstSizeLessExpected, this error code is not used in LZO. We will update the documentation on this.
The onlyerror code saying that something is wrong in decoding is ippStsLzoBrokenStreamErr code, which is returned by DecodeSafe function when it sees, that either format of compressed data is not valid, or output data doesn't fit into output buffer. We also will update DecodeSafe's description in the doc.

Incurrent situation I would recommend you to initialize dstLen with the real length of output datat buffer you allocated.

P.S. We track the problem, that bronxzv is suffered from. It will be fixed in the ongoing release.

Best regards,
Sergey

bronxzv · ‎10-05-2009

>P.S. We track the problem, that bronxzv is suffered from. It will be fixed in the ongoing release.

that's a very good new, FYI here is the workaround I use for production code :
I split the original array in multiple chunks (much like the example in the documentation) then I compress them in sequence, at each step I compress to a buffer+ decompress in an auxiliarybufferjust tocompare its content with the original array, if the arrays are exactly the same, I save the LZO compressed chunk ortherwise I save the uncompressed data, just for this chunk (all of this with a small header for each chunk to keep track of th compression method and inflated size)

this way I have a solid solution forward compatiblewith the fixed version of the IPP, with the fixed version I'll be allowed to increase the size of the chunks for better overal compression ratio though

Sergey_K_Intel · ‎10-06-2009

Quoting - bronxzv

FYI here is the workaround I use for production code :

Hard to hearing that :).
BTW, why do you use single-thread mode? a) Single-core CPU? b) Compatibility with native LZO libs? c) Real need of good compression ratio?
I am asking this, because with multi-threading you will propably loose another couple of %% in compression ratio, but will gain 2x+ in compression/decompression speed. Moreover, MT-mode LZO doesn't have that problem you are fighting against.

Sergey

bronxzv · ‎10-10-2009

>why do you use single-thread mode?
>b) Compatibility with native LZO libs?

yes this is definitely the reason, I use LZO in the final stage of a function that saves serialized object databases for an application with 100s of users (more apps and users planed)
it's paramount to be able to read back the persistent data in the future, so even if IPP no more support LZO in the future we will have a fallback solution by using the oberhumer LZOlibrary

now IPP LZO is so fast thatthe compressiontakes less than 10% of the total time to save a stream (tested in memory so HD access isn't taken into account), so the best way to improve performancein the future will be to use multiple threads for the whole serialization process, including the LZO stage (so multiple single threaded LZO functions will be run in parallel), it will probably requires a new file format though, on a related note most legacy file formats (JPG/MPGx/3DS/...) are definitely designed with a single thread in mind, it is something that must evolve if we want reallyto use all the cores when saving or loading files, with SSDs replacing HDs the hotspotsarenow clearly in the routines that load and save serialized files