Intel® Integrated Performance Primitives
Community support and discussions relating to developing high-performance vision, signal, security, and storage applications.

Multithreading and JPEG2000 decoding

I read that in the new IPP v7, jpeg codec multicore performance has been improved compared to IPP v6.
Question 1: Is this improvement for JPEG2000 also?
I'm using the UIC jpeg2000 decoder and I'm not seeing much improvement (at the most 10%) with two cores compared to one. (I disabled one core to test the difference). I have enabled OpenMP and I'm using 16bit precision instead of 32.
I know thatippiDecodeCBProgrStep_JPEG2K is not a threaded function, which is too bad since most of the processing is spent here.
Question 2: Do you have any suggestions on how I can improve the performance of my decoder further?
Question 3: I need to convert the decoder ouput format from YCbCr422 to RGB. Is there a better way to do it than this?: (a combined version for all of the ipp functions would be better i guess)
//Convert to 8u
ippiConvert_16s8u_C1R(pImg_16s_P3[0], srcW1, m_pImg_8u_P3[0], m_nWidth, m_ImgSize);
ippiConvert_16s8u_C1R(pImg_16s_P3[1], srcW1, m_pImg_8u_P3[1], m_nWidth, m_ImgSize);
ippiConvert_16s8u_C1R(pImg_16s_P3[2], srcW1, m_pImg_8u_P3[2], m_nWidth, m_ImgSize);
Ipp8u *pTemp = ippiMalloc_8u_C1(640*2, 480, &stepB);
//Convert to interleaved
ippiYCbCr422_8u_P3C2R(const_cast(m_pImg_8u_P3), w, pTemp, 640*2, m_ImgSize);
//Convert to RGB
ippiYCbCr422ToRGB_JPEG_8u_C2C3R(pTemp, 640*2, pOut, step, m_ImgSize);
0 Kudos
5 Replies


new threading model was developed in UIC JPEG codec only. Threading in UIC JPEG2000 codec still the same as in IPP 6.x. It is strange that you do not see improvement from JPEG2000 threading. It actually should be more than 10% on two threads. Could you please attach your JP2 file for us to check if there is anything in bitstream which makes threading not that efficient?

I do not have an additional suggestions on how to further improve performance of JPEG2000 codec. Unfortunately ippiDecodeCBProgrStep_JPEG2K function processes bitstream, which is sequential operation by the nature, so it can't be threaded.

I do not think that combined function will provide you significant performance improvement as it still have to do the same operations. I would suggest to try reorder computations on row by row basis, to increase data locality (makes better use of cpu cache).


I also think 10% is too small increase.
Can you test by decoding a large Jpeg2000 file, and keep a task manager CPU performance graphic open with two cores. Check that both are at 100%.
Thanks for a quick reply,
I have files in j2k format, which have been extracted from a "motion jpeg2000 file" encoded with an ADV202 chip. The images are encoded in YCbCr 422 format. I had to modify djp2codestream.h to be able to decode some of my files. In the function:
bool ReadNextTilePartHeader(){
if(!m_isNextTilePartExist) return false;
ByteInput &stream = *m_stream;
return false; //I added this to ignore the following check.
I will try the row by row idea you suggested.
I have monitored both cores and they are both at 100%.
I will try a larger image to see if there is any difference in performance.
I'm using the visual studio compiler now, but maybe the Intel C++ compiler would be a better choice?
Do not take it as a advertisement but my many years experiencegives me a feelingthat Intel C/C++ Compiler is the best optimizing compiler in the industry. Of course it is just a tool and developers have to apply tools in most efficient way, know and use its features. But even without that, on average, it generates code wichprovidesbetter run time performance in comparison with other similar tools.

Please note, that might not directlyapply to your question. There are many factors which affects overall application performance. Good optimizing compiler isonly one of them.