IPP DLLs does not differ from any other DLLs except big size for signal and image processing libraries. To minimize load delay you may call any single function somewhere at the beginning of your application.
You also may consider custom DLL option which allow you build IPP DLL with only functions your application using. This technique significatly reduces DLL size (as most of application may not need in all functions).
I'm having the same problem as he has (see my post about IppsDivCRev_32f_I), however I *am* using a custom build of the DLL (around 10MB), and the CPU usage I get upon the first call of DivCRev depends on the amount of threads told to the library, so I think the problem is somewhere else. If the problem was the DLL loading, I don't think the number of threads would matter.