I am trying to decode Jp2 images for a DShow filter. I'd like to decompress about 270 256x256 frames per second. The reason for the odd number is that I have to extract from a much larger image and then stitch them creatively, and I don't want to decode full images. I am currently getting decode times of 34 ms for each tile, totaling 9.45 seconds, and so I'm in need of help!
I had three questions:
1) What is expected? I see various performance metrics posted around, but is it reasonable to expect decoding 256^2 images at 3 ms? I'm willing to implement multithreaded decoding of the component tiles to boost parallelization, or get better hardware, or anything really. I might even have to shrink my image size, but I'm just curious what kind of numbers people have achieved.
2) How do I turn on the OPENMP parallelization? I'm using the Intel compiler, but I don't see the comments like "OpenMP DEFINED REGION WAS PARALLELIZED". The #pragma omp isn't recognized.
3) How do I boost the number of threads. I have set OMP_NUM_THREADS in my WinXP environment variables. Is there a better way to do this?
details: imagery is RGB planar 24 BPP planar format laptop is a Core 2 DUO T7300 2 Gz with 2 GB RAM compiler is Intel v11.074
Although there were no significant performance improvements in JPEG2000 codec since IPP 6.1 but there were some bug fixes in UIC, so I would recommend you to consider migration to the latest version, IPP 7.0or 7.0.2 update which is coming soon. Performance heavely depend on number of factors, for example encoding options (lossy or lossless, progression order, code block size and so on). Note, threading on UIC codec level is already integrated in UIC DLLs (if you use precompiled UIC DLLs) or should be enabled automatically when you rebuild UIC codecs with using build scripts coming with sample (and compile by Intel compiler). You may check how many threads used by UIC codec with NOfThreads() method of JPEG2000Decoder class.
Since you have access to completely independed image tiles in JPEG2000 compressed format I think it might be more efficient to not use internal threading at codec level and rather to decode several tiles in parallel (you will need several copies of JPEG2000 decoder, one for every thread you launch). The reason is that in this case amount of work done in parallel is much bigger than in case you use internal threading. Of course, internal threading might help you to maximize performance on many cores systems (like 12 or 16 threads), where it is possible to decode 6 or 8 tiles simulteneously and to enable each decoder to use 2 threads internally, that will useprocessor capabilitieson all 100%.