算法简介
例子

这里有一些大城市的建筑数据,第一列是人口,第二列是面积,第三列是高层建筑数目,第四列是高层建筑的点数。
首先读入这些数据
w = read.csv("/Users/caichicong/Downloads/cities1.txt")
然后调用kmeans算法
a = kmeans(w[,-1], 4)
print(a)
K-means clustering with 4 clusters of sizes 11, 35, 14, 18
Cluster means:
Population Area Buildings Points
1 3514328.2 2406.091 937.5455 4334.636
2 494480.8 572.800 207.0286 1172.686
3 8900659.9 2610.929 1981.5714 15726.571
4 1809637.9 558.000 430.2222 2260.833
Clustering vector:
[1] 3 3 3 1 1 3 3 3 3 4 4 1 4 4 3 4 1 2 1 3 4 3 2 4 2 3 1 3 2 4 3 2 4 1 2 2 2 4
[39] 2 2 4 1 2 2 2 2 3 4 2 2 2 2 1 2 2 1 2 4 2 4 2 4 2 2 2 4 4 2 2 2 1 2 2 2 4 2
[77] 2 2
Within cluster sum of squares by cluster:
[1] 7.550325e+12 1.948017e+12 3.474143e+13 3.623025e+12
(between_SS / total_SS = 93.8 %)
现在解释一下输出的结果
K-means算法把数据分成四类
K-means clustering with 4 clusters of sizes 11, 35, 14, 18
每个类的中心点
Cluster means:
Population Area Buildings Points
1 3514328.2 2406.091 937.5455 4334.636
2 494480.8 572.800 207.0286 1172.686
3 8900659.9 2610.929 1981.5714 15726.571
4 1809637.9 558.000 430.2222 2260.833
(between_SS / total_SS = 93.8 %) 这个结果很重要。