I haven't installed the Distribution for Python yet. Does anyone know if offloading from python to Xeon Phi coprocessor card is possible?
Background: I want to calculate the FFT of a 2 dimensional array with length of about 2**25 complex numbers many times. Then I have to multiply this array (in the frequency domain) elementwise with another array, then FFT back to time domain and multiply this with another array... and repeat this about 1000 times.
I am not an expert in C, so I would like to do this in python. Does anyone have a hint, if this is possible with this python distribution?
pyMIC will help you, as Artem R. suggests.
The only thing you have to take into account is that the latest version of pyMIC works OK with Linux, but it is still experimental in Windows. Thus, I suggest you working with Linux in case you want to use pyMIC. I've worked with it in Linux and it allows you to offload Python code to MIC.
You should definitely take a look at GT-Py, which is going to be launched in a few months. It has been recently announced and it can definitely help you in your future projects similar to the one you are mentioning. You can read about GT-Py here: http://software.intel.com/en-us/blogs/2016/03/22/gt-py-accelerating-numpy-programs-with-minimal-prog...
Intel python's numpy fft is accelerated by MKL, which should do automatic offloading. We have a beta coming out very soon that will also accelerate scipy fft. All of our testing has been focused on the beta with Knights Landing as host so it is probably best to implement your algorithm with scipy and wait for the beta.
Yes the upcoming beta I was referring to in this thread will have OS X support and is expected by end of week. Please note that we don't in general promise dates for releases, but it is still on track.
Thanks for your response. I'm downloading the new beta right now. I was looking forward to this release, as I was working on a Python project that has to run in OS X, Linux, Windows and will benefit from the performance improvements in this distribution.
Just a clarifying note. The beta we just released does not have an MKL that does automatic offloading and may not be providing the most optimized code for Phi. We hope to have better support by the product release. Feel free to email me if you have specific questions.
I have one specific doubt based on the comments in this thread. Is GT-Py going to be the way to offload or is it also going to be the option of having MKL offloading from Intel® Python Distribution, without GT-Py?
Intel Distribution for Python will have MKL offloading without GT-Py. Just to clarify, GT-Py is an experimental project right now. Feedback on your experiences will be appreciated.
How/When is the MKL offloadinge done? In my case as described above I have to performe many FFTs one after the other with array size 4Gb.
If the offloading is done for each FFT, then it's time consuming to shift the data to the PHI and shift it back afterwards. It would be better to offload the whole python stuff and execute it there.
Feedback by any means you want. A phone call might be best so we can ask questions.
Arne: I am looking for someone who understands the details better to answer your question.
Automatic offload is not available for all the functions. You can have a look at the list of functions here: https://software.intel.com/en-us/articles/intel-mkl-automatic-offload-enabled-functions-for-intel-xeon-phi-coprocessors
And you have more information about how this happens here: https://software.intel.com/en-us/articles/math-kernel-library-automatic-offload-for-intel-xeon-phi-coprocessor Basically, when using automatic offload you can specify what percentage of the work is performed on the host and what on the MIC. You specify this via environment variables.