- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had some code using ippiLUT_16s_C1IR, but since my source is 32f, I converted 32f to 16s, called LUT, and then converted back.
I wanted to optimize by replacing the 16s flavor with the 32f flavor ippiLUT_32f_C1IR.
However, ippiLUT_32f_C1IR is so slow I cannot use it!
Where ippiLUT_16s_C1IR takes milliseconds, ippiLUT_32f_C1IR takes seconds to execute.
Is this problem inherent to the function of the 32f flavor?
- Tags:
- Development Tools
- General Support
- Intel® Integrated Performance Primitives
- Parallel Computing
- Vectorization
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thomas.
16s data type allows using so-named "table optimization" because the size of the LUT table is 128Kb only. But 32f data requires direct calculation instead of using the table. It is the reason why 32f slower than 16s.
Thanks for your feedback.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page