How to multiply vectors in order to get a matrix

apolo74 · ‎07-28-2010

Hi there,

I need to multiply two vectors of dimensions aa=Mx1 and bb=1xN, to generate a matrix cc=MxN. So a typical IPP vector is a 1xM and I guess I have to transpose it first. In Matlab is something like (note that vector aa is being transposed):

aa = [1 2 3 4];
bb = [5 8 6];
cc = aa' * bb

cc =
5 8 6
10 16 12
15 24 18
20 32 24

It seems that ippm has this kind of functionality but for small matrices. I need to work with vectors between 100 and 400 elements... so the final matrix will be around 400x400. I then tried with ipps and ippi but I can't make it to transpose and then multiply to create the respective matrix of this kind of vector multiplication. Any help and suggestions will be greatly appreciated.

Boris

apolo74 · ‎07-28-2010

Sorry for that, it was a stupid question... but just in case someone else needs to see it:

Ipp32f src1[4] = {1.0f, 2.0f, 3.0f, 4.0f};
Ipp32f src2[3] = {5.0f, 8.0f, 6.0f};
Ipp32f dst[4*3] = {0.0f};

for( int i=0; i<4; i++)
ippsMulC_32f( src2, src1, dst+3*i, 3 );

I wonder if there is a better way of doing it... I just don't like FOR loops, a wate of computation time. Hope you ways have a better solution.

Boris

PaulF_IntelCorp · ‎07-28-2010

Hello Boris,

Did you review the matrix multiply functions? After all, a vector is simply a matrix with one side equal to one. See this link to the documentation:

http://software.intel.com/sites/products/documentation/hpc/ipp/ippm/index.htm

Paul

apolo74 · ‎07-29-2010

Hi Paul,

yes I checked that library but according to the documentation ippm is optimized for working with small vectors and matrices (3x3, 4x4, 5x5 and 6x6; vectors of length up to 6). And I need to work with matrices and vectors of length up to 360.

Boris

PaulF_IntelCorp · ‎07-29-2010

Boris,

Those functions will still work on larger matrices, as well. By optimized for small vectors it simply means the performance falls off as the matrices get larger. If you are using the Intel compiler you may find that the compiler generates better results, as it can optimize some of thos operations and you won't incur the overhead of managing parameters and calls to the IPP functions.

Paul

Chao_Y_Intel · ‎07-29-2010

Boris,

ippsMulC looks to be good choice for such functionality if the N is large.

for( int i=0; i ippsMulC_32f( src2, src1, dst+N*i, N );

Using ippsMulC make it use the vectorized code. Another optimization opportunity is to threaded the "for" loops with Intel Cilk, TBB, or OpenMP* (if needed, no high level threading in the application).

Thanks,
Chao