according to baidu's benchmark (https://github.com/baidu-research/DeepBench), mkl's deep learning with convolution (not gemm) has a much slower backward speed than the forward pass.
for example (https://github.com/baidu-research/DeepBench/tree/master/results), for W=341, H=79,C=32,N=4, K=32, R=5, S=10, in KNL7250 platform, forward 0.91ms, backward with input is 68.79 ms, with weight is 74.98 ms! so backward is 68 times slower than forward.
as a comparison, in titanx, forward is 0.74ms, backward with input is 3.09 ms, with weight is 0.76 ms. For forward, titanx is only a little faster than KNL7250, but for backward, KNL7250 is much slower. This is similar with other W,H,C configuration.
can any one give me the reason? is it because mkl has not made much optimization for backward yet? it seems mkl-dnn (https://github.com/01org/mkl-dnn) only supports forward operations now.