Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Zhang__Hao
Beginner
106 Views

Sum along specific matrix axis

I am working on a project where I want to accelerate numpy element-wise multiplication and sum.

What I am doing is transfer the numpy array to C pointer and use MKL function to accelerate them(through cython)

For element-wise multiplication I have got the vdmul function. However when I check for sum there is no suitable function

in MKL which could sum a matrix along its specific axis and return a smaller matrix.

Example:

input: matrix A, shape is [100,200,300]

B = sum(A, axis = 0)

B shape is [200,300]

Could anyone give some advice? Thank you very much!

0 Kudos
3 Replies
Gennady_F_Intel
Moderator
106 Views

may be you make sense to try the IDP ( Intel Distribution Package) witch will help ( probably will help) you to see perf benefits without changing the original Python code. 

Zhang__Hao
Beginner
106 Views

Gennady F. (Intel) wrote:

may be you make sense to try the IDP ( Intel Distribution Package) witch will help ( probably will help) you to see perf benefits without changing the original Python code. 

I have tested the IDP and found that numpy sum has almost same speed compared to original python. Actually they are both one threaded as I test them. Compared to another numpy function multiply, which is meant for matrix element-wise multiplication, IDP version will use 4 thread in my PC(I7-6700HQ) while original python only use 1 thread.

My original purpose is that as numpy sum is single threaded, I want to fully optimise it with multithreading, Do you have any other recommendations? Thanks very much!

TimP
Black Belt
106 Views

MKL doesn't include plain sum functions for the reason, that there is no possibility in the usual cases to improve on the performance of optimized C or Fortran compiled code.  Multi-threading would  improve performance only in the case where you have multiple memory controllers (multiple CPU platform) and have taken care to avoid remote memory access, by summing only on the stride 1 extent of the matrix, and keeping the largest stride extents consistently local to a single memory controller (CPU). This is probably not a sufficiently practical usage case to justify supporting in MKL, but would be no more difficult to support with your C or Fortran compilation than it would be with an MKL function.

Reply