vendredi 13 juin 2025

K-means algorithm

 The K-means algorithm is a well-known unsupervised algorithm for clustering that can be used for data analysis, image segmentation, semi-supervised learning... The k-means clustering algorithm is an exclusive method: a data point can exist in only one cluster.

K-means is an iterative centroid-based clustering algorithm that partitions a dataset into similar groups based on the distance between their centroids. The centroid (or cluster center) is either the mean or the median of all points.

Given a set of points and an integer k, the algorithm aims to divide the points into k groups, called clusters, that are homogeneous.

In this sample we generate a set of aleatory points in an image.


For processing data, we create a Red/Rebol object such as 

;--an object for storing values (points and clusters)
point: object [
x: 0.0 ;--x position
y: 0.0 ;--y position
group: 0 ;--cluster number (label)
]
The first step is to randomly define k centroids and associate them with k labels. Then, for each point, we calculate x and y Euclidian distances to the centroids and associate the point with the closest centroid and its corresponding label. This labels our data.

Secondly, we recalculate centroids, which will be the center of gravity of each labeled cluster of points. We repeat these steps until a convergence criterion is reached: centroids no longer move from the previous ones.




You will find the documented code for Red and Rebol 3 here:

 https://github.com/ldci/R3_OpenCV_Samples/tree/main/image_kmeans


Aucun commentaire:

Enregistrer un commentaire