I followed the Intel Python + Theano setup instructions found here, yet when I compile Theano functions (using icpc), they still link to libm math functions instead of svml. Is this correct? When I compile simple c code with `#include <math.h>` and a call to `tanh()` using icpc, it compiles with svml symbols automatically, and this is great because svml is so much faster! Is there a way to get Theano to use svml?
To extend the discussion a bit, in some speed tests I ran recently on single cores of both an i7-6900K and a KNL, I found that regardless of vector size, SVML is about as fast as the full MKL VML on the i7, so this is all that is needed to max out i7 performance. On the KNL, however, VML performance far exceeds SVML. (This is all using a freshly downloaded Parallel Studio XE 2017 installations.) Getting Theano to compile with VML is more challenging, because it requires slightly different source code, but this seems like the sort of thing that ought to happen in an Intel optimized version ;) ? Unlike on the i7, when calling standard `libm` math functions, `tanh()` and `expm1()` each take more time than matrix multiply for some NN functions on the KNL! (sure, could use ReLU, but that's beside the point)
Any help appreciated.
Thank you for bringing this to our attention. I believe you would have to add the appropriate linker flags (-lsvml) to .theanorc in order to link with svml. Also, you can try adding the vectorization report to the compiler flags in .theanorc as described here to make sure your loop is vectorizable. If this doesn't work, feel free to send me a reproducer of what you are trying to accomplish.
Theano generates c/c++ code for compiling, and when I dig into an example of that, I can see the call to `tanh`, but it is not in the context of a loop. I think this is why the Intel compiler does not recognize it as a candidate for svml. I'm no expert on how Theano's self-generated source code is set up, so I'll take this question over to a Theano-specific site. If a good answer comes back, I'll update here.
In the meantime, if there is anyone at Intel on the Python/Theano team who takes up this cause, it would be great to know that it is being worked on!