Hi Scott, The present version of Intel MKL Summary Stats does not provide functionality for median/mean absolute deviation/percentage error.At the same time, the library has "elementary" building blocks for those estimates includingmean and median. Can you please briefly clarify
-are those estimates- "bottleneck" in your code (that is, your application can spendsignificant amount of time incomputation of those estimates depending on the problem size)?
- what are the typical dimensions you work with (dimension of random vector/number of observations/etc)?
Actually, the only one I really care about is the MdAD from which I can easily get MdAPE. It would probably be more efficient to compute it at the same time as other statistics. But, I can always calculate it in another step. As a robust measure of spread, I consider MdAD to be the next most important summary stat after standard deviation.
Our software is applied to an incredibly broad array of datasets of which most are small or filtered. But, I'm planning for the future.
Hello Scott, this issue has been submitted to our internal development tracking
database for further investigation, we will inform you once a new update
becomes available.Here is a bug tracking number for your reference: 200220845.
>It would probably be more efficient to compute it at the same time as other statistics.
I doubt that this is so except for small arrays. The median and other measures of rank require different types of algorithms than the other statistics. The mean, variance, etc. all require that one keep running sums of expressions involving only the current data item and, possibly, other previously computed statistics.
The selection algorithm for computing the median, for example, has O(N) complexity on average, but can degenerate to O(N2) when the array is already sorted and, had we known it to be so, we could have simply picked the N/2-th element of the array as the median.
It would be fairly cheap for the library to compute an estimate of the median along with the other moments. For example, the median of 3 medians of 3 sampled triplets could provide such an estimate. Or, the median of a randomly chosen sample from the input array. In many cases, such an estimate may be all that is needed.
A user who wanted the true median could make a second call to the library with this estimated median, which could be used efficiently with the selection algorithm (or another algorithm that benefits from having an estimate of the median) to find the exact median. Normal users who have no interest in the median would not experience a noticeable loss of performance.