  C RUBY-ON-RAILS MYSQL ASP.NET DEVELOPMENT RUBY .NET LINUX SQL-SERVER REGEX WINDOWS ALGORITHM ECLIPSE VISUAL-STUDIO STRING SVN PERFORMANCE APACHE-FLEX UNIT-TESTING SECURITY LINQ UNIX MATH EMAIL OOP LANGUAGE-AGNOSTIC VB6 MSBUILD # Distance of pointsfrom cluster centers after K means clustering  » r » Distance of pointsfrom cluster centers after K means clustering

By : user2956769
Date : November 22 2020, 03:03 PM
hop of those help? It happens that you capture only the cluster element of the return value of kmeans, which returns also the centers of the clusters. Try this: code :
`````` #generate some data
traindata<-matrix(rnorm(400),ncol=2)
traindata=scale(traindata,center = T,scale=T) # Feature Scaling
#get the full kmeans
km.cluster = kmeans(traindata, 2,iter.max=20,nstart=25)
#define a (euclidean) distance function between two matrices with two columns
myDist<-function(p1,p2) sqrt((p1[,1]-p2[,1])^2+(p1[,2]-p2[,2])^2)
#gets the distances
myDist(traindata[km.cluster\$cluster==1,],km.cluster\$centers[1,,drop=FALSE])
myDist(traindata[km.cluster\$cluster==2,],km.cluster\$centers[2,,drop=FALSE])
`````` ## Clusters centers for Distance-based clustering

By : user3526113
Date : March 29 2020, 07:55 AM
hope this fix your issue Density based clusters can be of arbitrary shape.
For non-convex clusters, the center can be outside of the cluster. ## K-means Clustering in Opencv & python: Is there any option to cluster in mahalanobis distance?

By : JohnS
Date : March 29 2020, 07:55 AM
To fix the issue you can do The documentation shows no arguments in the constructor or otherwise that can change the distance metric. In fact, visiting the kmeans.cpp source on git, you can see from lines like this that Euclidean distance (i.e., normL2Sqr) is hardcoded:
code :
``````const double dist = normL2Sqr(sample, center, dims);
`````` ## Get Cluster Centers when using HDBSCAN Clustering

By : Anakin
Date : March 29 2020, 07:55 AM
The clusters may be non-convex, and if you compute the average of all points (and your data are points - they don't need to be) it may then be outside of the cluster. ## sklearn's KMeans: Cluster centers and cluster means differ. Numerical Imprecision?

By : MD Omarfaruk Shekh R
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further I think that it may be related to the tolerance of KMeans. The default value is 1e-4, so setting a lower value, i.e. tol=1e-8 gives:
code :
``````import pandas as pd
import numpy as np
from sklearn.cluster import KMeans

np.random.seed(0)
x = np.random.normal(size=5000)
x_z = (x - x.mean() / x.std()).reshape(5000,1)
cluster=KMeans(n_clusters=2, tol=1e-8).fit(x_z)

df = pd.DataFrame(x_z)
df['label'] =  cluster.labels_

difference = np.abs(df.groupby('label').mean() - cluster.cluster_centers_)
print(difference)

0
label
0      9.99200722e-16
1      1.11022302e-16
`````` ## K-Mean Clustering: Evaluating new Cluster centers

By : SingularMan
Date : March 29 2020, 07:55 AM
I hope this helps you . These are more or less two main approaches
It is more or less Lloyd approach - you iterate over all datapoints, assign each to the nearest cluster, then move all centers accordingly, repeat. It is more or less a Hartigan approach - you iterate over each data point and look if it is better to move it to other cluster (does it minimize the energy/make cluster more "dense"), repeat until no possible changes. 