Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6770 Discussions

Intel JPEGVIEW sample decoder soundly beat by .NET3.0 ?

xraygenfit
Beginner
960 Views
Hi,

I have extracted the decoder (and encoder) from the jpegview sample for ipp 5.2. I decode a jpeg, resize it, and reencode it to a much smaller jpeg. Now, I'm getting roundly beat by a C# program using .NET 3.0. After looking closely at the code, it seems that I'm spending almost all of my time decoding, which was not unexpected. However, in roughly 3/4 of the time the .NET sample is decoding, resizing, and reencoding into a thumbnail that looks much better. If anyone has any idea why this is and better yet how to fix it, I would greatly appreciate it. I'm attaching the code, it is a simple console program with everything happening in myjpeg.cpp. I've also attached the .NET program that is beating it (you'll need the .NET 3.0 runtime or you'll get an error). Thanks for any help
0 Kudos
9 Replies
xraygenfit
Beginner
960 Views
OK, I can't seem to upload. Let's try this


http://diehard2.googlepages.com/tumbtest2

Thanks for any help
0 Kudos
tim_wilson
Beginner
960 Views
Did no one address this post? I cannot tell you specifically what is wrong with Intel's JPEG decoder examples, but I can tell you for a fact that they are SLOW. Part of my job is seeking the fastest JPEG decoding algorithms/products, and I recently did a comparison of the latest Intel demo (C#) against our in-house decoder (which was actually built awhile back in pure C on older Intel libraries and now uses 5.1 libs). Our older decoder beat the current Intel demo by about 100% (i.e., twice as fast).

I can attribute part of the difference to optimizations we did for cases where you are decoding several images of the same size, depth and pallette, but I don't think that can account for all of it - I think that the demos are just very inefficient.

The first thing to do is be sure you are statically linking (if not using managed code) OR be sure you have the libguide.dll included in your dependencies so that the .dlls are loaded once efficiently.

I'm going to take a look at your .NET "fast" program - thanks.

-tim
0 Kudos
Vladimir_Dudnik
Employee
960 Views

Hello,

we will definetely investigate difference in performance demostrated in origianl post in this thread. I think it is something with how and what is measured in the test.

Regarding your the latest comment, what do you mean exactly under Intel demo(C#)? And what do you measured? Is it your unmanaged C code which call IPP against managed C# code which call unmaged IPP DLLs through interop services?

Note, you need to call ippStaticInit function at the beginning of your application in case you link it with IPP static libraries.

Regards,
Vladimir

0 Kudos
xraygenfit
Beginner
960 Views
Hi Vladimir,

Thanks for the response. As far as the how and what is measured, this is pretty much a real world type of test. Our application would essentially be reading a thumb, decoding it, reencoding it, and then writing it to disk. I've used the IPP in the past for image processing, and I've been very impressed, which makes this all the more disappointing. Of a greater concern than the speed however, is that the .NET thumbnails look much better. Like I said, I have no idea how they're doing it, but it seems to be faster and better. If you (or the other engineers) have any improvement suggestions, I would love to hear them.

As an aside, it would be nice if something like the initial test app could be included as a sample. I would think most of the people interested in this sort of work would be mainly interested in how to decode and encode jpegs, which this does without any bells and whistles. The original program is a bit of overkill when you just want to do something simple with the IPP.
0 Kudos
Vladimir_Dudnik
Employee
960 Views

Hello,

I had a chance to briefly look through your code.

Regarding resize quality I recommend you to try ippiResizeSqrPixel instead of just ippiResize, you may see on this forum, several people mentioned that ResizeSqrPixel provides better results.

regarding your way you do measure - you may built more optimial processing pipeline with IPP JPEG decoder if you want to get maximum performance. As I see your code in .NET use kind of pipelining, so probably somewhere in deep .NET internals memory buffers are reused, decoder object is reused and so on. But on IPP side you build just the simplest case and do allocation of buffers and creating decoder objects every time for every image. I think this is not optimal way.

Regards,
Vladimir

0 Kudos
xraygenfit
Beginner
960 Views
Hello Vladimir,

So I reorganized the program so that the decoder object isn't destroyed and recreated, and the speedup is pretty negligible (~1-2%). If I go from 8x8 to 2x2 in the decoding and then resize, the speed improves to get close to the .NET version (maybe they go from 8x8 to 1x1 if that's possible). On my laptop, it is 32 seconds for the Intel decoder, and 28 seconds for the .NET version. So, I'm closer...

I really thought this was going to blow microsoft out of the water from everything I had read about the Intel decoder. Oh well. Any further help would of course be appreciated. Thanks
0 Kudos
Vladimir_Dudnik
Employee
960 Views

Hello,

yes, for DCT based lossy JPEG compression it is possible to decode JPEG image into several fixed resolution scale: 1:1, 1:2, 1:4 and 1:8, the first one is full scale or original resolution and the last one is one eight of original size. This speeds up decoding process dramatically as DCT is one of the most time consuming operations in JPEG decoding process, so reducing its complexity will speed up decoder (in case of 1:8 IDCT is replaced with just DC coefficient divided by 64).

That is commonly used practice to quickly decode thumbnail from full scale image.

Regards,
Vladimir

0 Kudos
xraygenfit
Beginner
960 Views
Is 1:16 possible?
0 Kudos
Vladimir_Dudnik
Employee
960 Views

unfortunately no. IDCT operates on 8x8 blocks of coefficients, so the best it can achive in reducing isonly 1x1

Vladimir

0 Kudos
Reply