Are there any performance issues to be aware of? Should we optimize the app so that each processor has its own memory space?
IPP work on multi-processor, multi-core systems. And of course you will need to specifically optimize your application to take the most fromsystem with 8 or more available cores (although, your legacy code will also run). The particular optimization technique is very specific for application and amount of data it should process.
Yes,IPP itself should not cause problem. It was developed to be thread safe (you can call the same functions in different threads on different data and they will not interfere with each other). And it also was designed to utilize threading automatically on multi-core systems (when you link with IPP DLLs or IPP threaded static libraries). We also provide not threaded static libraries, which can be used in OS kernel mode or when application take care on threading above IPP. For many core systems (8 cores and more) it might be benefitial to implement threading on top of IPP, so application will have full control on threads. I would recommend you to take a look on Deferred Mode Image Processing (DMIP) layer we provide in IPP 6.0 beta. It is binary library built on top of IPP which help efficiently work with large images, keeping application memory working set small enough to fit processor L2 cache. The DMIP also implement threading on top of IPP.
Another good sample is image-tiling IPP sample, which demostrate how you can use IPP to process large images by tiles.