Hi Pavel,

Konovalov__Pavel · ‎11-27-2017

I have a tensor - batch of matrixes dims [10 x 6 x 52] 10 matrixes 6 * 52 raw major. I can change batch size as I want. Data type is - single float. And I need to normalize every matrix in the tensor by it columns sum(so sum will be a vector of length 52). So I need make a columnwise sum and devide every row in matrix to it. A pretty typical task in different areas. Currently, I am doing something like this:

//[10 x 6 x 52] - [batch x actions x cards_count]

// node.regrets is target and source tensor. node.regrets_sum - storage for sum.

const size_t actionsCount = node.ActionsCount();

for (long long b = 0; b < _batch_size; b++)

{

memset(node.regrets_sum.data(), 0, node.regrets_sum.size() * sizeof(float));

for (size_t a = 0; a < actionsCount; a++)

{

const size_t regretsOffset = (b * actionsCount + a) * cards_count;

vsAdd(card_count, node.regrets_sum.data(), node.regrets.data() + regretsOffset, node.regrets_sum.data());

}

for (size_t a = 0; a < actionsCount; a++)

{

const size_t regretsOffset = (b * actionsCount + a) * cards_count;

vsDiv(card_count, node.regrets.data() + regretsOffset, node.regrets_sum.data(), node.regrets.data() + regretsOffset);

}

And this is the hottest point of my app. I am pretty sure that performance can be improved because currently by the profiling I know that gemm with this tensor is faster than this normalization. Any ideas how to optimize this with help of MKL and Intel compiler? Maybe I have missed some ready to use routine for this case. Thank you in advance!

Ying_H_Intel · ‎12-28-2017

Hi Pavel,
there are some normalization function in Intel IPP and MKL for example, ipps_normlize, mkl_dnn tensor LRN etc.(please see their developer reference manual). Seemingly there is not exact the column based normalization. Considering your tensor size like 10x6x52, yes, you may use intel compiler like Openmp Verctorizion (generate FMA code directly) and multithread to optimize your c code .
Best Regards,
Ying

Ying_H_Intel · ‎01-04-2018

Hi Pavel,

One more comment, in the DAAL library, there is one normalize function , z-score, which can compute the xij-mj/thetaj by column.

Intel® Data Analytics Acceleration Library | Intel…

https://software.intel.com/en-us/blogs/daal

Z-score
Z-score normalization is an algorithm to normalize the observations by each feature (column).

C++: ./examples/cpp/source/normalization/zscore_dense_batch.cpp

Best Regards,

Ying

Normalize matrix by sum of columns