vendredi 13 juin 2025

K-means algorithm

 The K-means algorithm is a well-known unsupervised algorithm for clustering that can be used for data analysis, image segmentation, semi-supervised learning... The k-means clustering algorithm is an exclusive method: a data point can exist in only one cluster.

K-means is an iterative centroid-based clustering algorithm that partitions a dataset into similar groups based on the distance between their centroids. The centroid (or cluster center) is either the mean or the median of all points.

Given a set of points and an integer k, the algorithm aims to divide the points into k groups, called clusters, that are homogeneous.

In this sample we generate a set of aleatory points in an image.


For processing data, we create a Red/Rebol object such as 

;--an object for storing values (points and clusters)
point: object [
x: 0.0 ;--x position
y: 0.0 ;--y position
group: 0 ;--cluster number (label)
]
The first step is to randomly define k centroids and associate them with k labels. Then, for each point, we calculate x and y Euclidian distances to the centroids and associate the point with the closest centroid and its corresponding label. This labels our data.

Secondly, we recalculate centroids, which will be the center of gravity of each labeled cluster of points. We repeat these steps until a convergence criterion is reached: centroids no longer move from the previous ones.




You will find the documented code for Red and Rebol 3 here:

 https://github.com/ldci/R3_OpenCV_Samples/tree/main/image_kmeans


samedi 7 juin 2025

Compress and Uncompress Images

A few years ago, I presented a way of compressing images with the Red zlib proposed by Bruno Anselme (https://redlcv.blogspot.com/2018/01/image-compression-with-red.html). Since then, Red and Oldes's Rebol 3 have implemented different compression methods that are faster and simpler to use. 

Both languages feature a compress function. Input data can be string or binary values, which is useful for RGB images. Returned values are binary. Both languages use lossless compression methods. 

Red and R3 share the following methods: 
deflate: A lossless data compression format that combines the LZ77 algorithm with Huffman coding.
zlib: Implements the deflate compression algorithm and can create files in gzip format. This library is widely used, due to its small size, efficiency and flexibility.
gzip: gzip is based on the deflate algorithm.

R3 adds a few more algorithms: 
br: Compression Brotli. A fast alternative to GZIP compression proposed by Google.
crush: A lossless compression package developed by the NASA.
lzma: Lempel-Ziv-Markov chain algorithm, is a lossless data compression algorithm.

As these methods are variations on deflate compression, the compression ratio doesn't vary much from one method to another. The difference is in the speed of compression.
 
Of course, both languages have a decompress function. Input data is binary, and the method used must be the same as that chosen for compression.   

Here's a minimalist example of code for Red and R3.  

method: 'zlib ;--a word
img: load %../pictures/in.png         ;--use your own image
bin: img/rgb ;--image as RGB binary
print ["Method    :" form method]
print ["Image size:" img/size]
print ["Before compression:" nU: length? bin]
t: dt [cImg: compress bin method]         ;--R3/Red compress
print ["After  compression:" nC: length? cImg]
ratio: round/to 1.0 - (nC / nU) * 100 0.01                 ;--compression ratio
print ["Compression :" form ratio "%"]
print ["Compress    :" third t * 1000  "ms"]                 ;--in msec
t: dt [uImg: decompress cImg method]         ;--R3/Red decompress
print ["Decompress  :" third t * 1000  "ms"]                 ;--in msec
print ["After decompression:" length? uImg]

The result:

Method    : zlib

Image size: 1920x1280

Before compression: 7372800

After  compression: 4011092

Compression : 45.6 %

Compress    : 46.298 ms

Decompress  : 26.706 ms

After decompression: 7372800


Fast and efficient!