the question is does IPP optimized code uses special CPU instructions? The answer is - yes, of course. We always try to use CPU hw capabilities if they can help us to reach better performance.
Actually, this wasn't really my question. Since as you say you use special CPU instruction, it seems starnge that there isn't one atomic operation for getting the min(max) of 2 images(arrays).
The current implementation requires keeping an intermidiate result mask e.g for later assignment. Though convenient for some things, other operation can benefit from avoiding this second image pass.
By the way, have you seen ippsMinEvery/ippsMaxEvery API in ippSP library? Wesupport 16s/32s/32f data types for in-place operations, so you can use them for images at row-by-row basis.
Ok, I'm gladthat this info was useful somehow:)
Please do not forget to submit request for new functionality through the premier support service. Of course, it does not mean we automatically accept it, but we will consider it for the next versions of IPP. You understand, it always some compromise between what we want to do and what we can, at least at this moment:)