- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First, please let me apologize for the delay in this response.
If you're still interested we may be able to helpimprove your performance with multiple threads. However, bottom line is that we're not going tobe able to push our free sample to this level of performance with different parameters, compile options, or simple changes to the code.
Thank you very much for your feedback on this issue, including your analysis on ippiDecodeCBProgrSetPassCounter_JPEG2K.This looks like a great place to start for improving decode performance.Ihave fed thisrecommendation to product planningso it can be considered for future releases.
Best Regards,
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What maximum throughput could be achieved with IPP-7.0 JPEG/JPEG2000 codecs?
This is important question and we can't find the answer. Please adivse.
As we know, latest white paper concerning IPP performance benchmarks dated 2003.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://software.intel.com/en-us/articles/intel-ipp/#details
is that JPEG2000 performance improved betweenthe 6.1 and7.0 version of IPP.
Yourrequirements are always important.Since we don't have this data ready for publication, perhaps thebest way toassist withyour decision could be to let us knowwhere your expectations have been set by alternative solutions. Wemay be able toprovide assistance with setting up some quick tests to determine ifUICis close to those requirements. In any casewe can pass thisdataonto project planning to be considered for future releases.
Would this help?
Best regards,
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply. Unfortunately your link doesn't help. We don't need to know relative speed-up. We need to know real performance at standard conditions. This is example for jpeg: Baseline jpeg, standard images from Kodak set (grayscale or color), quality settings 50%, static tables for quantization and Huffman, good PC with Core i7, your recommended parameters for multithreading. We need results for compression ratio, compression time and decompression time (without time for image loading from HDD).
It should be like this: uic_transcoder_con.exe -o test.jpg -i lenna.bmp -j b -q 50 -t 1 -n 8
with details how to get maximum performance at coding and decoding. And we need a table with achieved results. Please have a look at these examples:
http://www.accusoft.com/picphotofeatures.htm#comparison_table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey,
We don't have this data now. However, your suggestion makes a lot of sense. I've escalated your feedback to our developers and project planners. No guarantees though. I can't promise if or when we will be able to produce this kind of report. A lot of effort and review needs to go into official performance comparisons, and this needs to be prioritized against other tasks.
Until then, my ability to provide this data is limited. In the interest of being helpful, since you mentioned that 4096x4096 decode performance is your most important requirement, here is a snapshot of what I'm seeing on my machine. Please note that these results have not been reviewed, and are not authorititative benchmark results. There are a lot of factors affecting performance, so your results may be different. The intent is simply to give you some indication of the results we're getting here.
Test platform: Intel Core i7-2600K CPU @3.40 GHz, Windows 7, IPP+samples 7.0.6 (Using pre-compiled executables for easy reproduceability. If you have not already, you may want to give these a try.)
library: ippje9-7.0.dll
Input image: 4096x4096 resize of standard lenna, sRGB colorspace, 50% quality, jp2 format.
./uic_transcoder_con.exe -i c:/videos/lenna4096x4096_2.jp2 -o test.bmp -n 1 -t 1 -m 20 : decode time: 548.71 msec
./uic_transcoder_con.exe -i c:/videos/lenna4096x4096_2.jp2 -o test.bmp -n 2 -t 1 -m 20 : decode time: 436.84 msec
./uic_transcoder_con.exe -i c:/videos/lenna4096x4096_2.jp2 -o test.bmp -n 4 -t 1 -m 20 : decode time: 381.77 msec
./uic_transcoder_con.exe -i c:/videos/lenna4096x4096_2.jp2 -o test.bmp -n 8 -t 1 -m 20 : decode time: 381.97 msec
Sorry we can't get everything today, but hopefully this is at least getting closer to the data you need.
Best regards,
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot for your reply. That's exactly what we are looking for. Actually we are more interested in Baseline JPEG compression for greyscale images, so let's try to get more understanding in the matter.
We can get greyscale test image "big_building.bmp" from here:
http://www.imagecompression.info/test_images/
Then we try to compress it to JPEG on Core i7 920 with the following command lines
1) uic_transcoder_con.exe -i big_building.bmp -o test.jpg -n 8 -q 50 -t 1 -m 1
Intel Integrated Performance Primitives
version: 7.0 build 205.85, [7.0.1058.205]
name: ippjy8-7.0.dll+
date: Nov 27 2011
image: big_building.bmp, 7216x5408x1, 8-bits unsigned, color: Grayscale, sampling: 444
decode time: 19.25 msec
encode time: 112.50 msec
2) uic_transcoder_con.exe -i big_building.bmp -o test.jpg -n 8 -q 50 -t 1 -m 20
We do compression bmp-to-jpeg with parameter -m 20. It's almost the same, we just ask to repeat compression 20 times.
Intel Integrated Performance Primitives
version: 7.0 build 205.85, [7.0.1058.205]
name: ippjy8-7.0.dll+
date: Nov 27 2011
image: big_building.bmp, 7216x5408x1, 8-bits unsigned, color: Grayscale, sampling: 444
decode time: 8.78 msec
encode time: 50.33 msec
We have the same image and same settings, with the only difference in the repeat count. What is the main reason for such a variation in performance (a factor of 2)? Can your software decode 37 MB image in 8.78 msec? Does your software do decoding at the same time as encoding?
Now we can do the same thing for decoding (we decode the image which we got in the previous test):
3) uic_transcoder_con.exe -i test.jpg -o test.bmp -n 8 -t 1 -m 1
Intel Integrated Performance Primitives
version: 7.0 build 205.85, [7.0.1058.205]
name: ippjy8-7.0.dll+
date: Nov 27 2011
image: test.jpg, 7216x5408x1, 8-bits unsigned, color: Grayscale, sampling: 444
decode time: 66.13 msec
encode time: 14.28 msec
Then we try to see what we get with -m 20:
4) uic_transcoder_con.exe -i test.jpg -o test.bmp -n 8 -t 1 -m 20
Intel Integrated Performance Primitives
version: 7.0 build 205.85, [7.0.1058.205]
name: ippjy8-7.0.dll+
date: Nov 27 2011
image: test.jpg, 7216x5408x1, 8-bits unsigned, color: Grayscale, sampling: 444
decode time: 37.63 msec
encode time: 9.00 msec
Decoding time varies by a factor of 2. The meaning of encode time here is not clear either.
What is the accuracy of time measurements? Could you explain the above results?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-m options defines number of loops (the same operations of encoding and decoding) to do.Usually in performance measurements it is helpful to do the same thing several times and to divide overall time by the number of iterations. You'll have better accuracy.
No, we haven't got better accuracy. Actually we've got significant "speed up" (from 112 ms to 50 ms for jpeg encoding) just with -m option. We think that it's not better accuracy. This is lack of accuracy.
Let's talk about real accuracy rather than better accuracy. It's not clear how you measure execution time and this is very important. If you show two digits after point do they mean anything? If you do the same encoding several times (without -m option), what error in terms of MSE will you get? How we can find out average time for encoding or decoding?
- But, there's a side effect related to the cache "temperature". When you repeat execution of the same algorithm, it can happen that the data you try to read is already in the CPU cache line(s). So, you spend less time waiting for instruction(s) to complete, then if required data must be loaded from virtual memory (RAM) to cache first and only then is consumed by CPU instruction.
- Thanks, we know about hot cache. As far as concerns your approach with -m option, it looks really strange due to hot cache. It's far from being real time measurements and one could say that this is not fair. We want to estimate encoding performance and we see something strange. Please advise. As we understand, option -m is not worth using because of hot cache. We think that we'd better use big images to increase accuracy of time measurements.
-
- Regarding JPEG decoding (your last example), encode time is just the time to repack JPEG-decoded image into BMP container.
This is also very strange. We see the phrase encode time, though it means time to repack JPEG-decoded image into BMP container.
- Then, I see "-n" option, which means multi-threaded execution. It also speeds up the execution.
Thanks, we know that. We are trying to find out how Intel recommend to do jpeg encoding and decoding in the fastest way. What would you recommend?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK. Let's consider this option to get more stable results. If you run "-m 1" several times, you will probably see quite a big fluctuation of results. "-mNo, we haven't got better accuracy. Actually we've got significant "speed up" (from 112 ms to 50 ms for jpeg encoding) just with -m option. We think that it's not better accuracy. This is lack of accuracy.
It's not clear how you measure execution time and this is very important. If you show two digits after point do they mean anything?
How we can find out average time for encoding or decoding?
This is also very strange. We see the phrase encode time, though it means time to repack JPEG-decoded image into BMP container.
As we understand, option -m is not worth using because of hot cache. We think that we'd better use big images to increase accuracy of time measurements
We are trying to find out how Intel recommend to do jpeg encoding and decoding in the fastest way. What would you recommend?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately we can't explain the difference between 112 ms and 50 ms as you do. So we have to consider "-mNo, we haven't got better accuracy. Actually we've got significant "speed up" (from 112 ms to 50 ms for jpeg encoding) just with -m option. We think that it's not better accuracy. This is lack of accuracy.
OK. Let's consider this option to get more stable results. If you run "-m 1" several times, you will probably see quite a big fluctuation of results. "-m" makes timing more predictable.
It's not clear how you measure execution time and this is very important. If you show two digits after point do they mean anything?
This is why we distribute samples in source code form. Two digits might be helpful if overall time is several msecs.
It
could be a good idea to round output and to show only necessary
digits. If two digits after point are right, then your accuracy in time measurements is equal to 0.01 ms. It's difficult to believe.
This is also very strange. We see the phrase encode time, though it means time to repack JPEG-decoded image into BMP container.
In terms of UIC BMP is also a "codec", so it has Encode method, which in reality is repacking of raw image array to BMP format (no any encoding performed). Nevertheless, since time marks are around EncodeImage function, the results go to "Encode time" section. The same situation will be with other codecs-containers like TIFF and PNM. There is also no encoding.
If
you indicate in command line bmp file with option -i (input) and jpeg
file with option -o (output), it means that there will be conversion
from bmp to jpeg. This is sufficient indication that process should
be called encoding. There is no any decoding here. In this case it could be better not to show timing for decoding at all.
We are trying to find out how Intel recommend to do jpeg encoding and decoding in the fastest way. What would you recommend?
General recomendations are to use appropriate library (with binary code according to used CPU). Multi-threading could also help if the image is not too short and the number of hardware cores is not too big (is this case the overhead on multi-thread support is significant).
What would you recommend for Core i7 920 and Win-7
(32/64)?
We do the following for jpeg encoding with 50% quality: uic_transcoder_con.exe -i lenna.bmp -o test.jpg -j b -q 50 -t 1 -n 8
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page