mercredi 9 octobre 2024

Using Z-score with Red or Rebol 3

 Detecting anomalies (outliers) is a classical problem in statistics and computer science.

https://medium.com/@akashsri306/detecting-anomalies-with-z-scores-a-practical-approach-2f9a0f27458d 

Z-score can help to solve this kind of problems. Z-score is calculated as follow z = x - mean / SD  where x is an individual value in the distribution, mean is the average of all distribution values and SD is the standard deviation of the data.

When applied on Gaussian distribution of data, Z-score generates a new distribution with mean = 0.0 and SD = 1.0. This is important when you need to compare data with different scales. 

Then we can use a threshold to identify outliers. A threshold value is a  cutoff point that helps determine what is considered as an anomaly or outlier within the values distribution.  Many scientists use the Z-score to exclude values they consider to be outliers from the data: values greater than 2 SD or less than 2 SD will not be retained. 

But, we can also use Z-Score for extracting significant values from a noisy signal, with these general considerations: There is basic noise in the signal with a general mean and SD of all timeseries. There are data points that significantly deviate from the noise (peaks).

I've found a good explanation here:

https://stackoverflow.com/questions/22583391/peak-signal-detection-in-realtime-timeseries-data of how we can deal with this problem.

The basic idea is simple: if a datapoint in the series is a given x number of standard deviation away from a moving mean, the algorithm gives a signal (equal to 1 ) which means that the datapoint is emerging from the noisy signal.

This is a Red/Rebol 3 function which illustrates how to do.

zThresholding: function [
data      [block! vector!]
output    [block! vector!]
lag       [integer!]
threshold [decimal!]
influence [decimal!]
][
sLength: length? data
filteredY: copy data
;--Red 
avgFilter: make vector! reduce ['float! 64 sLength]
stdFilter: make vector! reduce ['float! 64 sLength]
;--R3
avgFilter: make vector! reduce ['decimal! 64 sLength]
stdFilter: make vector! reduce ['decimal! 64 sLength]

avgFilter/:lag: mean data lag
stdFilter/:lag: stdDev data lag
i: lag
while [i < sLenght][
n:   i + 1          ;--index of the next value
y:   data/:n
avg: avgFilter/:i
std: stdFilter/:i
v1: abs(y - avg)
v2: threshold * std
either v1 > v2 [
output/:n: pick [1 -1] y > avg
filteredY/:n: (influence * y) + ((1 - influence) * filteredY/:i)
][
output/:n: 0
]
avgFilter/:n: mean   (at filteredY i - lag) lag
stdFilter/:n: stdDev (at filteredY i - lag) lag
i: i + 1
]
filteredY
]

And the result:




See for the code for Red and Rebol: 


Aucun commentaire:

Enregistrer un commentaire