Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

transposed matrix array x constant array?

norppa75
Beginner
404 Views
Hi there!

In our FEM-based application we have used several IPP matrix/vector array operations for small matrices, and the results have been rewarding. But, unfortunately, I noticed that e.g. operation "transposed matrix array x constant array" is missing from release 5.2.
I know that "matrix array x constant array" can be handled by using "vector array x constant array", since column stride (= stride1 ) is not needed in such a (trivial) case. My question is, how to accomplish the original task when (single) matrix transpositions are involved? At the time, I have contented myself with the routine as follows:

case 1.

--
"this" represents the constant array
-- "ma" represents the matrix array to be transposed
-- "pDst" is the resulting matrix array

//----------------------------------------------------------------
template
IPPMatrixArray& IPPVectorArray::operator* (const IPPMatrixArray& ma)
{
Subscript count = this->NoOfVecs(), isize = ma.Size()/count, rows = ma.NoOfRows(), icols = ma.Size()/(rows*ma.NoOfSmallMatrices());
if ((this->Size() == count) && (count == ma.NoOfSmallMatrices())){
IPPMatrixArray *pDst = new IPPMatrixArray(icols, rows, count);
// Standard description for source matrices
int src1Stride2 = sizeof(T);
int src1Stride0 = (icols*rows)*sizeof(T);
int src1Stride1 = icols*sizeof(T);
// Standard description for destination matrices
int dstStride2 = sizeof(T);
int dstStride1 = rows*sizeof(T);
int dstStride0 = (rows*icols)*sizeof(T);
IppStatus status;
T ival;
//
// NOTE. The following for-loop should be replaced by "transposed matrix array - constant array" operation
//
for (Subscript i = 0; i < count; i++){
ival = this->ptr;
// multiply (i+1)-th transposed single matrix by (i+1)-th constant
#ifdef ETYPE_FLOAT
status = ippmMul_tc_32f(
ma.GetPtr()+i*isize, src1Stride1, src1Stride2, ival,
pDst->GetPtr()+i*isize, dstStride1, dstStride2, rows, icols );
#else if ETYPE_DOUBLE
&nbs p; status = ippmMul_tc_64f(
ma.GetPtr()+i*isize, src1Stride1, src1Stride2, ival,
pDst->GetPtr()+i*isize, dstStride1, dstStride2, rows, icols );
#endif
// if( status < ippStsNoErr ) {
// printf( "-- error %d, %s ", status, ippGetStatusString( status ));
// }
}
return *pDst;
}
else{
if (count != ma.NoOfSmallMatrices())
cerr << " Fatal error in IPPVectorArray::operator* (const IPPMatrixArray&): Source array dimensions do not match! ";
if (this->Size() != count)
cerr << " Fatal error in IPPVectorArray::operator* (const IPPMatrixArray&): Left-hand side should be constant array! ";
this->size = 0;
this->no_of_vecs = 0;
return (IPPMatrixArray&)*this;
}
};
//----------------------------------------------------------------

It is yet remarkably faster than the following, simple two-step approach:

case 2.
-- vector array x constant array, w/ "ippmMul_vaca" ( = matrix array x constant array)
-- matrix array transposition, w/ "ippmTranspose_ma"

The slow-down in case 2 was significant, I tested it with 3-by-4 matrices of array size 1000000. I think that even case 1 could be speeded-up, if the proper function was available. Have I missed something..? Perhaps some stride-related thing or..?

Regards,

Villesamuli
0 Kudos
1 Reply
Vladimir_Dudnik
Employee
404 Views

Hello Villesamuli,

I'm sorry for the delay with answer, our expert was on vacation. So his answer is

Both casesare correct but not effective.

IPP Small matrix domain doesnt provide such functionality yet.

You might submit feature request through IPP Technical Support channel, then it will be revised at the next version planning stage.

Regards,
Vladimir

0 Kudos
Reply