If you compile your code with Intel C compilerthen for the memcpy/ippsCopy you compare mostly the same code as Intel compiler use the same optimized kernel from ippsCopy.
Regarding your note on zlib - could you please provide more details, what you actually compare, what conditions, what is IPP version, what is data you work with.
Our testing shows performance benefits for IPP optimized zlib over standard zllib. That is why it was stated in IPP book. You may fall into some specific conditions where it is not so, we would like to understand what is actually happen within your test.
you do not have debug version of IPP (itnever was released). If you experience difference in performance in your application for debug vs release build you need to check what is wrong in your application because in both cases you use release version of IPP binaries.
I found this post because I searched for "ippsConvert_16s32f is slower".
I have found the same using ippIP AVX (e9) 2020.0.1
What is interesting is that I found ippsConvert_16s32f to be 3-4 times faster if m7_ippsConvert_16s32f is used instead of e9_ippsConvert_16s32f.
I set this by calling ippSetCpuFeatures(PX_FM). Other IPP procedures perform faster when set to E9 as expected, but it seems that the M7 short->float conversion has some advantage.
The CPU I am testing with is:
AMD A10-7800 Radeon R7