Detecting anomalies (outliers) is a classical problem in statistics and computer science.
https://medium.com/@akashsri306/detecting-anomalies-with-z-scores-a-practical-approach-2f9a0f27458d
Z-score can help to solve this kind of problems. Z-score is calculated as follow z = x - mean / SD where x is an individual value in the distribution, mean is the average of all distribution values and SD is the standard deviation of the data.
When applied on Gaussian distribution of data, Z-score generates a new distribution with mean = 0.0 and SD = 1.0. This is important when you need to compare data with different scales.
Then we can use a threshold to identify outliers. A threshold value is a cutoff point that helps determine what is considered as an anomaly or outlier within the values distribution. Many scientists use the Z-score to exclude values they consider to be outliers from the data: values greater than 2 SD or less than 2 SD will not be retained.
But, we can also use Z-Score for extracting significant values from a noisy signal, with these general considerations: There is basic noise in the signal with a general mean and SD of all timeseries. There are data points that significantly deviate from the noise (peaks).
I've found a good explanation here:
https://stackoverflow.com/questions/22583391/peak-signal-detection-in-realtime-timeseries-data of how we can deal with this problem.
The basic idea is simple: if a datapoint in the series is a given x number of standard deviation away from a moving mean, the algorithm gives a signal (equal to 1 ) which means that the datapoint is emerging from the noisy signal.
This is a Red/Rebol 3 function which illustrates how to do.
data [block! vector!]
output [block! vector!]
lag [integer!]
threshold [decimal!]
influence [decimal!]
][
sLength: length? data
filteredY: copy data
;--Red
avgFilter: make vector! reduce ['float! 64 sLength]
stdFilter: make vector! reduce ['float! 64 sLength]
;--R3
avgFilter: make vector! reduce ['decimal! 64 sLength]
stdFilter: make vector! reduce ['decimal! 64 sLength]
avgFilter/:lag: mean data lag
stdFilter/:lag: stdDev data lag
i: lag
while [i < sLenght][
n: i + 1 ;--index of the next value
y: data/:n
avg: avgFilter/:i
std: stdFilter/:i
v1: abs(y - avg)
v2: threshold * std
either v1 > v2 [
output/:n: pick [1 -1] y > avg
filteredY/:n: (influence * y) + ((1 - influence) * filteredY/:i)
][
output/:n: 0
]
avgFilter/:n: mean (at filteredY i - lag) lag
stdFilter/:n: stdDev (at filteredY i - lag) lag
i: i + 1
]
filteredY
]
Aucun commentaire:
Enregistrer un commentaire