Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6704 Discussions

UIC Encoding/Decoding JPEG images - very slow

georgiswan
Beginner
649 Views
Hello,
I am using the UIC samples to decode JPEG images and it seems to be too slow. It takes about 3 sec to decode my test image and I can decode it in about 0.8 sec using FreeImage ( http://freeimage.sourceforge.net/ -- which itself is using libjpeg I believe).
Here is the output of the test program for uic. Note that I changed the code a bit to get the 'real' user timing (in bold below). The low-level routine pretends to decode in 209 msec but from the high level call in the demo program, it really takes 3 sec :
$ ./uic_transcoder_con -t 1 -i test.jpg -o out.jpg
Intel Integrated Performance Primitives
version: 7.0 build 205.105, [7.0.1077.205]
name: libippjy8.so.7.0+
date: Apr 8 2012
Decode using ftime : 2968 msec
image: test.jpg, 3646x5470x3, 8-bits unsigned, color: RGB, sampling: 444
decode time: 209.44 msec
Encode using ftime : 3262 msec
encode time: 465.57 msec
Any idea of what I could be doing wrong? I expect that decoding with UIC would be at least as fast as FreeImage.
I am using :
composer : Composer_2011.11.339 with IPP 7.0.7
ipp_samples: l_ipp-samples_p_7.0.7.049
Thanks!
Gilbert
0 Kudos
8 Replies
Sergey_K_Intel
Employee
649 Views
Hi Gilbert,
Could you try your operation in a single-thread mode (i.e. use "-n 1" option)?
As far as I remember there was an issue on Linux with measuring time by using standard "time.h" functions. These functions return so called "process time", which is sum of all thread times. For example, if the piece of the code is executed in 2 parallel threads and each of them takes 0.5 sec - i.e. the whole piece finishes in 0.5 sec by wall clock - the Linux's time measuring functions will return the value of 1 sec.
If my assumption is correct, then when you use "-n 1" you will get both times (returned by uic_transcoder and by your measurements) about the same. If not, we'll be investigating the issue. Hope, you use multi-core CPU )).
Regards,
Sergey
0 Kudos
georgiswan
Beginner
649 Views
Hi Sergey,

.

I tried -n 1 and it idoes not change anything. The time reported is still ~240 msec but the real time is ~3 sec.
.
The real time is 3 sec as confirmed with the external call to ftime() before and after the high level decode routine.
.
The computer under which I run this has 8 cpus and I compiled the sampled with the script build_intel64.sh .
.
Can I do something to speed it up? FreeImage can decode the image in about 0.8 sec (real user time). So I expect that I should be able to get at lest that with UIC. Being almost 4 times slower, I must be doing something wrong but I cannot find it.
.
$ ldd ./uic_transcoder_con
/usr/java/jdk1.6.0_16/jre/lib/amd64/libjsig.so (0x00002adfcf199000)
libuic_core.so => ./libuic_core.so (0x00002adfcf29c000)
libuic_io.so => ./libuic_io.so (0x00002adfcf4a2000)
libuic_bmp.so => ./libuic_bmp.so (0x00002adfcf6aa000)
libuic_pnm.so => ./libuic_pnm.so (0x00002adfcf8af000)
libuic_jpeg.so => ./libuic_jpeg.so (0x00002adfcfac1000)
libuic_jpeg2000.so => ./libuic_jpeg2000.so (0x00002adfcfe40000)
libuic_dds.so => ./libuic_dds.so (0x00002adfd00fc000)
libuic_png.so => ./libuic_png.so (0x00002adfd030f000)
libuic_tiff.so => ./libuic_tiff.so (0x00002adfd0552000)
libuic_jpegxr.so => ./libuic_jpegxr.so (0x00002adfd0759000)
libippch.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippch.so.7.0 (0x00002adfd09b1000)
libippdc.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippdc.so.7.0 (0x00002adfd0ab8000)
libippcc.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippcc.so.7.0 (0x00002adfd0bc3000)
libippcv.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippcv.so.7.0 (0x00002adfd0ce0000)
libippj.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippj.so.7.0 (0x00002adfd0e05000)
libippi.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippi.so.7.0 (0x00002adfd0f19000)
libipps.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libipps.so.7.0 (0x00002adfd10ca000)
libippcore.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippcore.so.7.0 (0x00002adfd1233000)
libiomp5.so => /opt/intel/composerxe/lib/intel64/libiomp5.so (0x00002adfd134c000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e29600000)
libm.so.6 => /lib64/libm.so.6 (0x0000003e28600000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003e6fe00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003e6e200000)
libc.so.6 => /lib64/libc.so.6 (0x0000003e27e00000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003e28200000)
libtbb.so.2 => /usr/lib64/libtbb.so.2 (0x00002adfd1648000)
/lib64/ld-linux-x86-64.so.2 (0x0000003e27a00000)
librt.so.1 => /lib64/librt.so.1 (0x0000003e29e00000)
.
$ cat /proc/version
Linux version 2.6.18-128.7.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 08:21:56 EDT 2009
.
Thanks
.
Gilbert
0 Kudos
georgiswan
Beginner
649 Views
Here is an update: before UIC, I was using a wrapper around the old IPP samples to decode a jpeg . Those files are more or less in ipp-samples\realistic-rendering\3d-viewer\jpegcodec\* now.
.
With the code in these files, I do get a significant improvement to decode : about 200 msec compared to 800 with freeimage.
.
Any idea how to get that with UIC? UIC is nicer because it supports more format and JPEG in CMYK.
.
Thanks
.
Gilbert
0 Kudos
georgiswan
Beginner
649 Views
I just found the the CTimer object at the beginning of the DecodeImage() routine takes a long time to create. I have removed it and now the DecodeImage routine is blazing fast as expected, about 4 times faster then FreeImage.
I am not familiar with CTimer and I dont know whether it is a known problem or not but I can certainly live without it (ftime() works great with no overhead).
So, if like me you want to wrap the UIC library into your own project and use the DecodeImage() and EncodeImage() high level functions, you should remove the CTimer object from these routines (unless this problem is specific to my installation).
My problem is solved.
Gilbert
0 Kudos
Sergey_K_Intel
Employee
649 Views
Hi Gilbert,
Nice to hear that the problem is resolved.
Yes, you're right about CTimer::Init.
It is not a "known problem", but specifics we had to notify. On Linux, CTimer::Init calls 'ippGetCpuFreqMhz' (you can see from timer.cpp source file). This - ippGetCpuFreqMhz - function directly measures CPU frequency by counting CPU clocks and this measurement takes about 3 seconds. On Windows there is no problem like this, because Init function reads frequency from system performance counters.
I will speak with information development team to add notification about this to ippGetCpuFreqMhz description, will add notification to uic_transcoder_con description (and its console output) and, probably, will modify uic_transcoder_con source code to avoid operations with timer if no timing is asked in command line options.
Thank you again,
Sergey
0 Kudos
georgiswan
Beginner
649 Views
Hi Sergey,
In my own code, I always used ftime() directly to get the real user time. It has no overhead. This is what i use now in my copy of ipp samples now. I am not usre about the advantage of Ctimer, is it better at measuring the CPU time in a specific thread ?
Thanks
Gilbert
0 Kudos
Sergey_K_Intel
Employee
649 Views
Gilbert,
Ftime() must be ok, since it shows astronomical (absolute, wallclock) time, though it may be not precise enough. In our measurements we try to use CPU clocks, because a) the time intervals we need to measure are usually shorter and b) these values to some extent are frequency-invariants (mostly depending on CPU architecture).
There's another Linux function - clock() - which returns total process time, including all childs of process (i.e. parallel threads). This function must be used carefully.
Regards,
Sergey
0 Kudos
levicki
Valued Contributor I
649 Views
If you are hoping to support CMYK images using UIC with no effort on your part then get ready for a (not so pleasant) surprise. In order to correctly display colors from CMYK JPEG you will have to use color management which UIC does not provide meaning that you must use external library (such as LittleCMS) which will significantly slow down your decoding.
0 Kudos
Reply