I have written a simple serial implementation of KNN to calculate the distance from vector "a" for every vector in matrix "b", where
"a" is 1 by n
"b" is m by n
The distance is recorded for every i'th vector in "b" in result.distance. At the end, the data in struct are sorted based on distance.
I have been trying to map my inputs and outputs to DAAL's KD-Tree KNN, but not luck so far. I seem to be having difficulty in passing "a" and "b" in the data frame format expected by the function. Also, the example that comes with DAAL only shows how to invoke prediction on the testing data by training the model, but it is not clear how to retrieve the distances and indices from the model. I would highly appreciate the help as the KNN function is the performance bottleneck in my program.
void knn_serial(double* a, double* b, int m, int n, knn_output* result)
//Norm level used for distance calculation
double L = 2;
for (int i = 0; i < m; i++)
result.distance = 0;
result.index = i;
for (int j = 0; j < n; j++)
result.distance = result.distance + pow(abs(a
result.distance = pow(result.distance, 1 / L);
qsort(result, m, sizeof(knn_output), compare_knn);
int compare_knn(const void* a, const void* b)
knn_output* a_knn = (knn_output*)a;
knn_output* b_knn = (knn_output*)b;
if (a_knn->distance < b_knn->distance) return -1;
else if (a_knn->distance > b_knn->distance) return 1;
else return 0;
Intel® Data Analytics Acceleration Library does not provide indices and distances to the user at the moment.
Could you please provide some additional details about your use-case and the reason to sort feature vectors?
Given an out-of-sample feature vector "a", I attempt to find the k-nearest (sorted by distance) feature vectors in in-sample feature matrix "b". Then, by applying a statistical learning algorithm, I can learn the relationship between k-nearest features vectors and their corresponding labels to predict the label of "a". Looking forward to seeing updates in the future. Also, please let me know if there is a current work around to this. DAAL KD-Tree KNN must internally calculate the distance and sort the feature vectors to invoke the final prediction.
So, is k-nearest neighbors the step of some algorithm, which predicts the label of "a", but in different approach relative to k-nearest neighbors? Or, do you just want to use some specific approach to estimate label based on distances (like weight classes of neighbors based on distances)?
Yes, exactly I use the KNN as a setup for my own statistical learning algorithm to identify the indices of the k-nearest feature vectors in "b" (row position). I only apply learning on the k-nearest "b" feature vectors and their corresponding labels, then I apply the learned model to the out-of-sample feature vector "a" to predicts its "unknown" label. Sorry if I'm repeating myself. Please let me if we can still do this using current Intel libraries.
Please let me know if there's any workaround to retrieve indices and distances from KD-Tree KNN. If not, I would appreciate it if you can pass my request to the DAAL development team. Either way, please let me know.
Thank you for request and providing the details about use case. At the moment, the only possible option is to use open source DAAL (https://github.com/intel/daal) to add interface to access to the data you need manually. Please note that distances are calculated internally in the algorithm, as you mentioned. Your request will be taken into account by DAAL team to expand API accordingly in future releases.
I think I found where the sorted indices are located. They are stored in "indexes" after sorting " inSortValues.idx". They can be found in "algorithms\kernel\k_nearest_neighbors\kdtree_knn_impl.i" (file is attached). However, I feel challenged when it comes to understanding how to call "indexes" after executing the model.