Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

ijl15 vs ijl20 vs IPP jpeg decode performance

trebo
Beginner
4,572 Views

Hi,

We have been using ijl15 for decoding jpeg images for quite a while.
We have now upgraded to IPP 5.3 and ijl20 and we are noticing a performance slowdown in decoding jpeg images.

The versions of the ijl are:
ijl15 - 1.5.4.36
ijl20 - 2.0.18.50

Whatwe do is basically:

JPEG_CORE_PROPERTIES m_jcp;
BYTE* m_pData;
ijlInit(&m_jcp);
ijlRead(&m_jcp, IJL_JBUFF_READPARAMS);
m_pData = (BYTE*)ippMalloc(dwSize);
m_jcp.JPGBytes = pPicData;
m_jcp.JPGSizeBytes = dwPicDataSize;
m_jcp.DIBBytes = m_pData;
ijlRead(&m_jcp, IJL_JBUFF_READWHOLEIMAGE);

And this is the performance I get with exactly the same code for ijl15 and ijl20:

Using ijl15
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1125 ms
Average: 11.250000 ms per pic
Average: 1.255222 ms per 100.000 pixels
Average: 5.498104 ms per 100.000 bytes

Using ijl20
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1219 ms
Average: 12.190000 ms per pic
Average: 1.360102 ms per 100.000 pixels
Average: 5.957501 ms per 100.000 bytes


Is this a known issue? Can I do something to get better performance with ijl20?

Br,
Robert

0 Kudos
30 Replies
trebo
Beginner
907 Views

Hi Vladimir,

I got my hands on a Core 2 duo system, and I still get best performance with IJL15 and worst with IPP:
Milliseconds to decode picture 1152x778 pixels:
IJL15 IJL20 IPP
11.7 12.814.5

Even if we disregard from my own testprogram, I get the same result with your sample applications!
I have tested this application: Ipp5.3.2ipp-samplesimage-codecsjpeg-ijlinwin32_cl8jpgview.exe
Which I suppose uses IJL20.
And this application: Ipp5.3.2ipp-samplesimage-codecsjpegviewinwin32_cl8jpgview.exe
Which I suppose uses IPP decoder.
The test results are as follows on a Core 2 Duo system with Vista Business 6.0 build 6000:
IJL20 IPP
12.8-15.8 13.7-31,7 (milliseconds to decode picture 1152x778 pixels - read from 'USEC' in the status bar in the program)

We clearly see that IJL20 is faster with 12.8 ms compared to 13.7 ms for IPP.
We also se that IPP has a much bigger span, highest value 31.7 ms compared to 15.8 ms for IJL20.
How come?
What are your test results if you compare both applications?

If IPP still is the fastest for you, can you please provide me with test applications where IPP is faster than IJL, so I can try this on my Core 2 duo system?

Br,
Robert

0 Kudos
Vladimir_Dudnik
Employee
907 Views

Hi Robert,

I've attached test program which I use this time (to rebuild it you will need old IJL library, we do not distribute them anymore). Precompiled executable is located in Release folder. If you will specify no parameters then generated image will be used for testing otherwise you need to specify valid name of BMP file (24-bit per pixel)

Regards,
Vladimir

0 Kudos
trebo
Beginner
907 Views

Vladimir, thanks for the test program!

I have tested it now, and as you say, it shows better performance with IPP than IJL15 on a Core 2 Duo system.
However, there are more to consider!

1.
You are using IPP 6.0.82.530.
We are using IPP 5.3.85.467, which is the latest version released to us.
IPP 5.3and IJL15 has about the same performance on Core 2 Duo, and IJL15 is faster than IPP 5.3on Pentium D!
How come you are comparing with 6.0, when the latest released is 5.3.2?
Why haven't this been mentioned?

2.
CPU usage.
It's a fact that IPP 6.0 is faster than IJL15 and IPP 5.3 on Core 2 duo. But it also doubles the total CPU usage from 50% to 100%. Since IPP 6.0 is twice as fast, it actually isn't faster at all if the CPU usage also is considered!

3.
Pentium D.
On my pentium D machine I have the following results with a 2880x1944 image:
IJL15: 116 ms
IPP 5.3: 145 ms
IPP 6.0:140 ms
IJL15 is clearly fastest.
IPP 5.3 is actualy second best, since it is only slightly slower than IPP 6.0, while only consuming 50% CPU. IPP 6.0 has 100% CPU usage.


Considering these three facts, I really can't seeany performance improvement with IPP compared to IJL15, neither IPP 5.3 nor IPP 6.0 and neither on Pentium D nor Core 2 duo systems.
What are your comments on this? Is there more to be considered?

Br,
Robert

0 Kudos
Vladimir_Dudnik
Employee
907 Views

Hi Robert,

So, basically you were able to reproduce the results which Ihave on my system(IPP JPEG is faster then the old IJL library). That's good.

1. IPP 6.0 beta just was published, you can register and download it from IPP main page. But just in case, I also attached the same pre-built application linked with IPP 5.3. Please try it and let us know what is results on your system. On our side it shows that IPP outperform IJL just like IPP 6.0 beta did in previous application.

2. "...it actually isn't faster at all if the CPU usage also is considered!". Probably there is some disperance in terms. We call something is faster when it can do more for the same amount of time. It says nothing on how calculation intensive it will to make the things faster.

3. Unfortunately, I do not have Pentium D system in hands, so can't test it. By the way, one guess I just get - IJL was compiled with Intel C/C++ compiler, whereas my application attached in the previous post was compiled with VC2005, that might be one of the reasons for worse performance. The second reason is as I already said somewhere in this thread that we increase arithmetic precision in color conversion functions in IPP because of many customers complain on relatevely big rounding errors in IJL. That cost us some performance. You may compare PSNR for IJL and IPP JPEG codec.

Taking all of that into account Isee that at least on Core2 system (where I can run this test) IPP do the work for 60 msec (compression) and 57 msec (decompression) while IJL do for 191 msec and 93 msec accordingly. From my perspective, 60 msec to compress 2Kx2K image is faster than 191 msec for the same job. I also expect that to do the work more than twice faster will definetely require more processor resources.

Please find attached precompiled test application built with Intel C/C++ compiler and linked with IPP 5.3.

Regards,
Vladimir

0 Kudos
trebo
Beginner
907 Views

Hi Vladimir,

Yes, it's nice I managed to reproduce the results.

1.
Ok, we'll stick to 5.3 untill 6.0 is official.
Your 5.3 app gave the same result as the 6.0.

2.
Ok, I was a bit unclear. It is faster as you say. However, we often decode several motion jpeg streams at once and whenever we decode more than one stream at once, we will not have the performance improvement since the cpu load doubles.
Anyway, we prefer IPP since it fully utilizes the cpu even when we decode only one stream, and also since the color conversion is imroved.

3.
The performance on the Pentium D system is the same with the 5.3 application I got from you...


I have one (perhaps last) problem!
I haven't been able to get the performance improvement with the 5.3 application I rebuilt from your 6.0 application. I'm not sure why, but one guess is that it's becaus I don't have "libiomp5mt.lib" and had to remove it from Additional dependencies for the linker. Could this be the case? If so, where can I get the library?
If not, what could else be the problem?
The application runs ok, but it only uses 50% cpu, so the performance is of course half compared to your application. Both on Pentium D and Core 2 duo systems.

Br,
Robert

0 Kudos
Vladimir_Dudnik
Employee
907 Views

Hi Robert,

The libiomp5 is new Intel OpenMP run time library. It comes with Intel compiler starting from version 10.1. If you haveprevious version of Intel compiler you can replace it with libguide library. If you do not have Intel compiler at all, but do have MS VC2005, you can modify Makefile if such a way to enable OpenMP threading in JPEG codec (basically you need to add compiler option -openmp).

Regards,
Vladimir

0 Kudos
trebo
Beginner
907 Views

Hi,

I have tried both with libiomp5 and the -openmp option, but neither helps.

Are there any differences between 5.3 and 6.0 projects? (I only have the 6.0 project, which I tried to make 5.3...).
Or is it easiest if I could have your 5.3 project as well?

Br,
Robert

0 Kudos
Vladimir_Dudnik
Employee
907 Views

Robert,

to build jpeg.lib library I used IPP JPEGView sample. When you build it by build32.bat icl101it will enable OpenMP threading automatically in IPP 5.3 version. In IPP 6.0 beta version it also will enable OpenMP threading when you build it with VC2005.

Regards,
Vladimir

0 Kudos
trebo
Beginner
907 Views

Ok, I rebuilt jpeg.lib and now it works fine!

Thank you very much for your patience in this matter Vladimir!

Br,
Robert

0 Kudos
Vladimir_Dudnik
Employee
907 Views

You are welcome! Please let us know how do you find functionality and interface of IPP JPEG codec, any inconsistences you may find in it, we will be glad to further improve it.

Need to inform you, there was threading issue in IPP 5.3 JPEGencoder which was fixed in IPP 6.0 version. The issue can lead to corrupted JPEG stream generation when you repeatedly encode frame by frame with threaded version of JPEG encoder (there was lack of synchronisation between threads).

Vladimir

0 Kudos
Reply