MdAD, MAD, MdAPE, MAPE

tennican · ‎07-06-2011

Can MKL compute the summary stats below?

MdAD => median absolute deviation
MAD => mean absolute deviation
MdAPE => median absolute percentage error
MAPE => median absolute percentage error

thanks, Scott

Andrey_N_Intel · ‎07-07-2011

Hi Scott,
The present version of Intel MKL Summary Stats does not provide functionality for median/mean absolute deviation/percentage error.At the same time, the library has "elementary" building blocks for those estimates includingmean and median. Can you please briefly clarify

-are those estimates- "bottleneck" in your code (that is, your application can spendsignificant amount of time incomputation of those estimates depending on the problem size)?

- what are the typical dimensions you work with (dimension of random vector/number of observations/etc)?

Thanks in advance,
Andrey

tennican · ‎07-07-2011

Hi Andrey,

Actually, the only one I really care about is the MdAD from which I can easily get MdAPE.
It would probably be more efficient to compute it at the same time as other statistics.
But, I can always calculate it in another step.
As a robust measure of spread, I consider MdAD to be the next most important summary stat after standard deviation.

Our software is applied to an incredibly broad array of datasets of which most are small or filtered. But, I'm planning for the future.

thanks, Scott

Andrey_N_Intel · ‎07-07-2011

Hi Scott,
Thanks for the additional details.
We wouldanalyze your requestto understand what could be done.
Best,
Andrey

Gennady_F_Intel · ‎07-07-2011

Hello Scott, this issue has been submitted to our internal development tracking database for further investigation, we will inform you once a new update becomes available.Here is a bug tracking number for your reference: 200220845.

mecej4 · ‎07-07-2011

>It would probably be more efficient to compute it at the same time as other statistics.

I doubt that this is so except for small arrays. The median and other measures of rank require different types of algorithms than the other statistics. The mean, variance, etc. all require that one keep running sums of expressions involving only the current data item and, possibly, other previously computed statistics.

The selection algorithm for computing the median, for example, has O(N) complexity on average, but can degenerate to O(N²) when the array is already sorted and, had we known it to be so, we could have simply picked the N/2-th element of the array as the median.

It would be fairly cheap for the library to compute an estimate of the median along with the other moments. For example, the median of 3 medians of 3 sampled triplets could provide such an estimate. Or, the median of a randomly chosen sample from the input array. In many cases, such an estimate may be all that is needed.

A user who wanted the true median could make a second call to the library with this estimated median, which could be used efficiently with the selection algorithm (or another algorithm that benefits from having an estimate of the median) to find the exact median. Normal users who have no interest in the median would not experience a noticeable loss of performance.

tennican · ‎07-08-2011

You misunderstand.
I said that computing the Median Absolute Deviation at the same time that you compute the Median might be more efficient.

I was NOT referring to the computation of the mean or any other moment.

mecej4 · ‎07-09-2011

Good to know; there is no misunderstanding, then, as to the features desired from the proposed enhancements.