how to decrease the unused CPU time,(in the results obltained from intel concurrency checker))

i checked my codec performance with threading and without threading(i employed slice level parallelism),i am still seeing no time improvement..while in my experience i applied threading to one individual module of the codec i witnessed the double performance...Is there some thing we can do to utilise the unused CPU time??..
please help me..the requirement of the codec is real time performance...for that i need to fix this issue..
thanks in advance..
