new threading model was developed in UIC JPEG codec only. Threading in UIC JPEG2000 codec still the same as in IPP 6.x. It is strange that you do not see improvement from JPEG2000 threading. It actually should be more than 10% on two threads. Could you please attach your JP2 file for us to check if there is anything in bitstream which makes threading not that efficient?
I do not have an additional suggestions on how to further improve performance of JPEG2000 codec. Unfortunately ippiDecodeCBProgrStep_JPEG2K function processes bitstream, which is sequential operation by the nature, so it can't be threaded.
I do not think that combined function will provide you significant performance improvement as it still have to do the same operations. I would suggest to try reorder computations on row by row basis, to increase data locality (makes better use of cpu cache).