When I need a sequence of ippi functions for a composite operation I could, of course, call those ippi functions in sequence. However, for large images that would be a rather bad usage of the CPU Ln caches.
Another solution is to create a composite "ippi" function that calls for each row the equivalent series of ipps functions. An added advantage is that this can be threaded easily, by giving each thread its own set of rows.
If the RowBytes (StepSize) of the image(s) equal(s) the PixelSize multiplied by the ImageWidth, we can work per chunk rather than per row. Let's assume that sysctlbyname with "hw.l1dcachesize" returns the L1 cache per core. The chunk size can be chosen to be that size for in-place operations or half that size for operations from a source to a target.
Is that optimal ? Or what does Intel recommend ?
Adriaan van Os
Hello Adiaan van Os,
Thanks for your proposal.
For the cache usage, does the link below meet your expectation?
And, if possible, could you help create ticket for the feature request or enhancement in our online service center through https://supporttickets.intel.com/?lang=en-US