Re: The fastest way to decompress a jpeg-2000 codestream to an

Steve_Williams · ‎12-08-2008

Hi everyone,

I've got a fair amount of experience with jpeg-2000using other APIs (OpenJPEG, j2k-codec, Luratech). I thought I'd take a look at Intel's solution to see how it compares.We needmaximum performance on decode, but only a really basic API- memory buffer j2k to memory buffer RGB (and ideally, optionally direct toDXTn, but texture compression is currently done as a second pass). We also need support for j2k's incorporating an alpha channel and also greyscale (with and without alpha). We don't care about encode performance (within reason), but do care about encode quality. The application is real-time decode of real-time 3D application assets.

Here's what I've managed to come up with so far using the Intel API. It works, but could be a lot faster.

The code was originally based on decode.cpp from the Intel jpeg-2000 sample.

I welcome input from anyone who has suggestions as to how the performance of this code can be improved.

Thanks in advance.

Best regards,

Steve Williams

Director

Advance Software

PS. We also need the ability to decode to a lower resolution from the encoded resolution for scaling on low spec equipment.

// Intel IPP includes ...

#include "djp2file.h"
#include "edib.h"
#include "mdjp2.h"
#include "medib.h"
#include "memoryinput.h"
#include "memoryoutput.h"
#include "consdiagnoutput.h"
#include "transcodingexception.h"

#ifdef IPP_JPEG2000_TIMING
#include "timer.h"
#endif

typedef ByteInputSigned > MemoryInputBE;
typedef ByteOutputSigned > MemoryOutputBE;

class MemoryBoundedInputBE : public ByteInputBoundService
{
public:
MemoryBoundedInputBE() {}
virtual ~MemoryBoundedInputBE() {}

using MemoryInput::Attach;
using MemoryInput::Detach;
};

// -----------------------------------------------------------------------------------------------------------------------
//
// Function : Intel_IPP_J2K_Decode
//
// Converts a j2k codestream in memory to an RGB888 memory buffer.
//
// TODO: 1. Upgrade to enable specification of desired output pixel formats.
// Need to specifyfour output pixel formats : grey, grey with alpha, colour, colour with alpha.
//
// 2. Enable automatic jp2 decode when detected.
//
// -----------------------------------------------------------------------------------------------------------------------

void * __cdecl Intel_IPP_J2K_Decode(void *src_data, unsigned long src_len, unsigned long &dest_width, unsigned long &dest_height)
{
if (!src_data || src_len==0)
return NULL;

ConsDiagnOutput diagnOutput;
BDiagnOutputPtr diagnOutputPtr = diagnOutput;

MRstrImage metaImage;

MemoryBoundedInputBE jp2Stream;

jp2Stream.Attach((const Ipp8u *)src_data);
jp2Stream.PushSize(src_len);

#ifdef IPP_JPEG2000_TIMING
Timer timer;
timer.PriorityBoost();
timer.Start();
#endif

if (0) //jp2
{
DJP2File jp2File;

jp2File.AttachDiagnOutput(diagnOutput);

jp2File.Attach(jp2Stream);

jp2File.ReadIntroBoxes();

if(!jp2File.ReadNextCSMainHeader())
throw DiagnDescrCT();

ReAllocMetaImageComponents(jp2File.CSMainHeader(), metaImage);

SetMRstrImageResolution(jp2File.HeaderBox().Resolution(), metaImage.CaptureResolution(), metaImage.DisplayResolution());
SetMRstrImagePalette(jp2File.HeaderBox(), metaImage.Palette());

while(jp2File.ReadCSNextTilePartHeader())
{
while(jp2File.ReadCSPacket());
}

FixedBuffer > interfaceImgRef(metaImage.NOfComponents());
SelectOnlyImageCore(metaImage.ComponentImage(), interfaceImgRef, metaImage.NOfComponents());

jp2File.UpdateCSImageComponents((ImageCoreC*)interfaceImgRef);

}
else // JPEG 2000 codestream
{
DJP2Codestream j2kCodestream;

j2kCodestream.AttachDiagnOutput(diagnOutput);

j2kCodestream.Attach(jp2Stream);

j2kCodestream.ReadMainHeader();

ReAllocMetaImageComponents(j2kCodestream.MainHeader(), metaImage);

while(j2kCodestream.ReadNextTilePartHeader())
{
while(j2kCodestream.ReadPacket());
}

FixedBuffer > interfaceImgRef(metaImage.NOfComponents());
SelectOnlyImageCore(metaImage.ComponentImage(), interfaceImgRef, metaImage.NOfComponents());

j2kCodestream.UpdateImageComponents((ImageCoreC*)interfaceImgRef);
}

// Create output.
DIBEncoder dib;
SetBestDIBSizeAndResolution(metaImage, dib.Info(), true, diagnOutputPtr);
SetBestDIBDepthAndPalette (metaImage, dib.Info(), diagnOutputPtr);

ImagePn image_map;
image_map.Alloc(dib.Info().Size().Width(), dib.Info().Size().Height(), NOfChannels(dib.Info().Depth()));

ConvertImageChannelsToDIB(metaImage, dib.Info(), image_map.Channels(), false, diagnOutputPtr);

// Intel codec returns 32 bits (signed) per channel. We want 8 bit unsigned, so convert ...

// TODO [OPTIMIZATION] : Decode j2k codestream directly into an 8 bits per channel format to remove unnecessary conversion step.

// Allocate an output buffer, and return image dimensions.
dest_width = dib.Info().Size().Width();
dest_height = dib.Info().Size().Height();
Ipp8u *dest_buffer = new Ipp8u[dest_width*dest_height*3];

MemoryOutputBE output;
output.Attach(dest_buffer);

// Convert ...
WriteDIBImageDataDepth_24(output, dib.Info().Origin(), dib.Info().Size(), image_map.Channels()[0], image_map.Channels()[1], image_map.Channels()[2]);

output.Detach();

#ifdef IPP_JPEG2000_TIMING
timer.Stop();
timer.PriorityRelease();
printf("decoding time = %f msec\n", timer.GetTime(Timer::msec));
#endif

return (void*) dest_buffer;
}

Vladimir_Dudnik · ‎12-08-2008

For simple memory to memory interface I would recommend you to take a look on Unified Image Codec API we intriduce in IPP 6.0

Note, JPEG2000 codec in UIC also extended over j2kit sample with additional wavelet transform threading on codec level.

Regards,

Vladimir

Steve_Williams · ‎12-09-2008

Hi Vladimir,

I tried UIC, but have found it isnot yetready for use.

Here's why :-

1. Picnic has a QT dependency. A build of UIC fails with no obvious error message when QT is not installed.
I had to figure out how to bypass the error and build the transcoder, as I don't need a full application,
just an API. I don't want or need to install QT. What should happen is Picnic build should fail with
a message that QT needs to be installed, ideally with a URL of where it can be downloaded from (for the interested), then the remaining projects in the directory should attempt to build.

I worked around this dependency by replacing the line :

call %NMAKE%
in build32.bat

with
call %NMAKE% /I

... to ignore errors and continue. It would be great if everyone else does not have to figure this out.

2. After this, I was able to build uic_transcoder_con.exe.

3. I tested this with an RGB j2k codestream like this :

uic_transcoder_con.exe - i c:test.j2k

The output was as follows :

image: c:test.j2k, 128x128x3, 8-bits, color: Grayscale, sampling: 444

This is incorrect. Its an RGB image, not a greyscale (and it can't be collapsed to a greyscale either - the channels are different)

Test image available on request.

I started debugging this with some printfs, then decided that this code is not yet ready for commercial use.

I decided to drop back to the jpeg-2000 sample, because it works.

4. UIC currently has no support for textures including alpha channels - this is required.

5. Straightforward (console) memory-memory decode & encode sample apps are required, just basic console apps, so everyone doesn't have to pull apart complex sample apps to figure this out. Also, if you guys do it, the code will be optimal.

6. How doI enable debug builds with your build system ?

Best regards,

Steve.

Vladimir_Dudnik · ‎12-09-2008

Hi Steve,

we will provide MSVC studio project files for UIC sample in the next IPP release, so you will be able to build in debug configuration. For the current version you have to create VC solution by yourself.

Yes, GUI part of picnic application is based on Qt library. This is why we additionally provide simple command line application which does not depend on any third party GUI libraries.

Thanks for reporting issue with wrong detection of image color format. I think the reason is that UIC JPEG2000 encoder expected JP2 file format where color information is specified explicitly. When it did not find color info in J2K stream (which is simple JPEG2000 bitstream) application use some default value which is wrong. Regardless of that the resulting decoded image was correct I believe. You can check it by specifying the name of output file (-o out.bmp for example).

It also would be nice if you can attach problem file here, so we may investigate the issue in more details.

Regards,
Vladimir

Steve_Williams · ‎12-09-2008

Hi Vladimir,

> we will provide MSVC studio project files for UIC sample in the next IPP release

Thanks, Is it possible to build with debug information usingyour current scripts/makefiles ?

> I think the reason is that UIC JPEG2000 encoder expected JP2 file format where color information is specified explicitly. When it did not find color info in J2K stream (which is simple JPEG2000 bitstream) application use some default value which is wrong. Regardless of that the resulting decoded image was correct I believe. You can check it by specifying the name of output file (-o out.bmp for example).

That sounds about right. The way other apps I've used/written process codestreams is as follows :

1 channel - greyscale

2 channels - greyscale with alpha

3 channels - rgb

4 channels -rgb + alpha

(though it's just a codestream, so the channels could in theory represent anything you like)

I exported to a bitmap as you suggested, and it looks fine, so apologies for overreacting. I'll take another look.

Example codestream uploaded, as requested.

Example output here :

exe -i c:test.j2k -o c:test.bmp
image: c:test.j2k, 128x128x3, 8-bits, color: Grayscale, sampling: 444
decode time: 22.40 msec
encode time: 0.31 msec

Your decode is far slower than your encode. Is that right ? I would have expected the opposite. Is it simply an error in the console output (displayed wrong way round) ?

Cheers,

Steve.

Vladimir_Dudnik · ‎12-09-2008

The performance of encoder and decoder should be close to each other although not identical. To get stable result I would recommend you to run the same command 2..3 times this will kind of warm up OS disk cache.

Note, you also may control number of threads to be launced by JPEG2000 codec internally (-n NUM cmd line option)

Additional note, in the next IPP release we also plan to support DXT1, DXT3, DXT5 encoder and decoder IPP functions.

Vladimir

Steve_Williams · ‎12-09-2008

> The performance of encoder and decoder should be close to each other although not identical. To get stable result I would recommend you to run the same command 2..3 times this will kind of warm up OS disk cache.

That's not what I'm seeing. I ran a number of times, the first couple being slow for reasons stated above.

The figures I posted above are about average. I'm running on a Centrino Duo T2300 laptop (1.66GhZ).

The app was compiled using VC 2003.

> Note, you also may control number of threads to be launched by JPEG2000 codec internally (-n NUM cmd line option)

I didn't notice much difference, but then I've only got two cores on my old laptop.

> in the next IPP release we also plan to support DXT1, DXT3, DXT5 encoder and decoder IPP functions.

We don't need encoding from DXTn.

Decode direct to DXTn would be great. Decode then recompress is a major bottleneck for us, and other real-time 3D engines.

Would also be great if we can decode to a different resolution from the image resolution in the file for use on slower machines.

Stats for the same test image,using the latest Intel compiler :

decode time: 22.57 msec
encode time: 0.29 msec

... are you sure this is right ?!

Cheers,

Steve.

Vladimir_Dudnik · ‎12-09-2008

Thanks, let's try to understand what might be a reaon for that slowdown... The uic_transcoder application should also provide on console some info about IPP version and what processor-specific code was used. Could you please paste here the whole output from uic_transcoder application?

By the way, decoding to DXTn means exactly decoding to RGB and then encoding into DXTn. No other way even if we hide this work insidesome consolidatedcodec the actual operations are still be there. That mean you can just call DXT compression after UIC codec output.

I've entered your request to provide reduced resolutioon in UIC codec into our feature requests data base. In fact, functionality is there but it not provided with codec interface yet.

Vladimir

Steve_Williams · ‎12-09-2008

> decoding to DXTn means exactly decoding to RGB and then encoding into DXTn. No other way even if we hide this work inside some consolidated codec the actual operations are still be there. That mean you can just call DXT compression after UIC codec output.

As I understand it, a j2k decode results in a YUV bitmap which is then remapped to RGB space. It *might* be possible to go direct from the YUV to DXTn, skippingan RGB conversion.

I haven't checked the maths, so this might be utter rubbish. I just thought I'd float the ideapast you.

> I've entered your request to provide reduced resolution in UIC codec into our feature requests data base. In fact, functionality is there but it not provided with codec interface yet.

Thanks. The request is actually specific resolution. The API could provide a GetImageProperties call, which would return the image width/height/channel count, etc. information.

We'd then make a second call with the desired output resolution, desired output pixel format, and in some circumstances provide a pointer into a buffer for the data to be written into.

(the latter can probably be achieved using your stream classes).

I tried turning timing mode on with -t=1 and got no additional information, but the timing changed !

Here's the results from the intel compiler build with -t=1, same test file.

decode time: 14.58 msec
encode time: 0.06 msec

I've also uploaded an rgbj2k codestream file with an alpha channel which you might find useful for testing.

Edit: I added the files to the private section :

http://software.intel.com/file/8195

http://software.intel.com/file/8198

Best regards,

Steve.

Vladimir_Dudnik · ‎12-09-2008

I do not see any attached files. Could you please check if you do attachment correct? There is link for instruction how to attach files to this forum.

The fastest way to decode JPEG2000 image in reduced resolution is to use native wavelet downscaling (but it only supports power of two factors). If this is not what you want you can use just ippiResizeSqrPixel function.

I do not think there is big potential in possible performance gain by skipping YUV-RGB color conversion step in JPEG2000 codec. Color conversion is quite simple operation which effectively implemented with using SSE instruction set. But entropy coding part of JPEG2000 is complex and not SIMD friendly kind of operations. It takes significant part of the whole coding pipeline, so color conversion is only a few percent of the time required by whole codec.

It make sense in JPEG case where all operation are relatively simple and take more-less equal time of codec pipeline.

Vladimir

Steve_Williams · ‎12-09-2008

Thanks Vladimir.

I've posted the link to the test images above.

There's some largerimages I'musingfor testing here :

http://advance-software.com/examples/slideshow/photos/1.j2k

thu .... 20.j2k

These larger images decode faster than they encode, as I would expect.

Best regards,

Steve

Steve_Williams · ‎12-10-2008

Hi Vladimir,

I've taken another look at your UIC sample, and have run into problems.

I've put together a minimal reproducible here : http://advance-software.com/misc/adv_sw_intel_test.zip

Build the app in MSVC 2003, debug mode. Step into the program. You will note an exception is hit on exit from the test function.

I think this is because memory was allocated in one module, and released in another.

Hope this helps.

Best regards,

Steve.

Vladimir_Dudnik · ‎12-10-2008

Steve,

I've got JPEG2000 test streams but was not able to download test case.

Note, you may submit your issue report to Intel Premier Support and provide test case in this ad-hoc channel. This will be routed to our team as well.

Regards,
Vladimir

Steve_Williams · ‎12-11-2008

Hi Vladimir,

Your less than excellent forum software screwed up the link.

It's fixed (above). I've reported this to Intel. I don't see any point doing so twice.

Best regards,

Steve

Vladimir_Dudnik · ‎12-11-2008

Thanks again, now I managed to download the test case. We will look at it.

Vladimir

Steve_Williams · ‎12-11-2008

Thanks, Vladimir.

The fastest way to decompress a jpeg-2000 codestream to an RGB888 memory buffer ...