I had some code using ippiLUT_16s_C1IR, but since my source is 32f, I converted 32f to 16s, called LUT, and then converted back.
I wanted to optimize by replacing the 16s flavor with the 32f flavor ippiLUT_32f_C1IR.
However, ippiLUT_32f_C1IR is so slow I cannot use it!
Where ippiLUT_16s_C1IR takes milliseconds, ippiLUT_32f_C1IR takes seconds to execute.
Is this problem inherent to the function of the 32f flavor?
16s data type allows using so-named "table optimization" because the size of the LUT table is 128Kb only. But 32f data requires direct calculation instead of using the table. It is the reason why 32f slower than 16s.
Thanks for your feedback.