Does anybody know of a 2D convolution function that is well optimized for the phi? We have 2-megapixel images and a 26x26 nonseparable kernel. Also, how much speedup should we expect compared to a 6-core i7 or Xeon? My baseline is ippiFilter_8u_C4R on a CPU or nppiFilter_8u_C4R on GPU.
My understanding is that most computer vision convolutions don’t use floating point, and so might not be best suited for the coprocessor. Currently, there no MIC implementations of ippi_filter_8u_CAR available. Still, if you can make a case for its optimization, we’ll forward it to the IPP development team. The more well justified requests for MIC optimized IPP functionality we have, the more likely it will be recognized as a priority.
Currently, NVidia nppiFilter is much faster on a mid-range GPU than ippiFilter on a mid-range CPU, when used with moderately large kernels. I would think Intel would want to do even better with the Phi. Evidentally, NVidia thinks it is important enough to continue shipping updates to NPP, which I find useful. Despite the lack of extreme integer support (which GeForce also lacks), the Phi should do well, especially in high-def, due to being able to do tiled execution over its well-equipped memory hierarchy.