How to calculate distance when we have sparse dataset in K nearest neighbour
By : user5306
Date : March 29 2020, 07:55 AM
Does that help To make sure I'm understanding the problem correctly: each sample forms a very sparsely filled vector. The missing data is different between samples, so it's hard to use any Euclidean or other distance metric to gauge similarity of samples. If that is the scenario, I have seen this problem show up before in machine learning - in the Netflix prize contest, but not specifically applied to KNN. The scenario there was quite similar: each user profile had ratings for some movies, but almost no user had seen all 17,000 movies. The average user profile was quite sparse.
|
Efficient implementation of the Nearest Neighbour Search
By : Václav Tůma
Date : March 29 2020, 07:55 AM
|
Efficient nearest neighbour search in Scala
By : Hasni Aamir
Date : March 29 2020, 07:55 AM
I hope this helps . I'm not sure if this is helpful (or even stupid), but I thought of this: You use a sort-function to sort ALL elements in the grid and then pick the first k elements. If you consider a sorting algorithm like recursive merge-sort, you have something like this: code :
def minK(seq: IndexedSeq[coord], x: coord, k: Int) = {
val dist = (c: coord) => c.dist(x)
def sort(seq: IndexedSeq[coord]): IndexedSeq[coord] = seq.size match {
case 0 | 1 => seq
case size => {
val (left, right) = seq.splitAt(size / 2)
merge(sort(left), sort(right))
}
}
def merge(left: IndexedSeq[coord], right: IndexedSeq[coord]) = {
val leftF = left.lift
val rightF = right.lift
val builder = IndexedSeq.newBuilder[coord]
@tailrec
def loop(leftIndex: Int = 0, rightIndex: Int = 0): Unit = {
if (leftIndex + rightIndex < k) {
(leftF(leftIndex), rightF(rightIndex)) match {
case (Some(leftCoord), Some(rightCoord)) => {
if (dist(leftCoord) < dist(rightCoord)) {
builder += leftCoord
loop(leftIndex + 1, rightIndex)
} else {
builder += rightCoord
loop(leftIndex, rightIndex + 1)
}
}
case (Some(leftCoord), None) => {
builder += leftCoord
loop(leftIndex + 1, rightIndex)
}
case (None, Some(rightCoord)) => {
builder += rightCoord
loop(leftIndex, rightIndex + 1)
}
case _ =>
}
}
}
loop()
builder.result
}
sort(seq)
}
|
How can I calculate the nearest neighbour points of two different size matrices in R?
By : Mohammed Suhail
Date : March 29 2020, 07:55 AM
like below fixes the issue For the cl argument, you need a vector that is as long as there are rows in train; you're passing a matrix which is converted to a vector twice as long. Try: code :
dim.knn=knn(train=x.train, test=x.test, cl=seq_len(nrow(train)), k=1)
|
SQL efficient nearest neighbour query
By : user3764309
Date : March 29 2020, 07:55 AM
it helps some times Could you verify that I got the question right? Your table represents vectors identified by the groupId. Every vector has a dimension of something between 100 and 50,000, but there is no order defined on the dimension. That is a vector from the table is actually a representative of equivalence class.
|