In many of the data we collect in hospital, we are dealing with time series, some of which show an unexpected variation as a function of time. For example, in our work on the perception of babies' cries by adults, we observed that most of the signals showed a linear temperature drift over the course of the experiment. This is probably linked to the electronics of our camera. For these reasons, I've developed a few simple algorithms in Red and Rebol 3 that solve some of these problems. I mainly use datatype vector!, which is very efficient for numerical calculations with Red or Rebol 3.
One of the first ways is to remove the DC constant from the signal. Simply remove the mean value of the signal for each value of the signal. Rebol and Red have a function (average) that calculates the average of a vector.
detrendSignal: func [v [vector!]
"Remove continuous component in signal"
][
;--basic (x - mean)
_v: copy v
_average: average _v ---average is a native function in Red and Rebol 3
repeat i _v/length [_v/:i: _v/:i - _average]
_v
]
Now let's move on to signal normalization. Normalization is basically bringing signals to the same range or a predefined range. A typical example of a predefined range is the statistical approach of the normalization, which is transforming the signal so that its mean is 0 and standard deviation is 1. This is very useful when you want to compare signals with different amplitudes. Simply calculate the standard deviation of the distribution before normalizing the signal.
stddev: func [v [vector!]
"Standard deviation"
][
sigma: 0.0
foreach value v [sigma: sigma + (power (value - average v) 2)]
sqrt sigma / ((v/length) - 1)
]
normalizeSignal: func [v [vector!]
"Z-score algorithm"
][
;--use z-Score algorithm (x - mean / standard deviation)
_v: copy v
_average: average _v; ;---average is a native function in Red and Rebol 3
_std: stddev _v ;--get standart deviation
repeat i _v/length [_v/:i: _v/:i - _average / _std]
_v
]
Another way of normalizing data is to use the minimum and maximum values contained in each data series. With this algorithm, the values of each series are in a space [0.0 .. 1.0].
minMaxNormalization: func [v [vector!]
"Min Max normalization"
][
;-- use min-max algorithm (x: x - min / xmax - xmin)
_v: copy v
xmin: _v/minimum xmax: _v/maximum
repeat i _v/length [_v/:i: (_v/:i - xmin) / (xmax - xmin)]
_v
]
But these techniques aren't always effective, because they don't detect the anomalies (outliers) contained in the signal. For this reason, I often use an algorithm based on the median of the distribution. This algorithm is more robust and minimizes the effects of anomalies. Of course, we need to calculate the median and interquartile range of our signal.
median: func [
"Return the sample median"
sample [vector!]
][
data: sort to block! copy sample
n: length? data
case [
odd? n [pick data n + 1 / 2]
even? n [(pick data n / 2) + (pick data n / 2 + 1) / 2]
]
]
interquartileRange: func [
"Return the sample Interquartile Range"
sample [vector!]
][
data: sort to-block copy sample
n: length? data
Q1: 0.25 * n ;--(1 / 4)
Q2: 0.50 * n ;--(2 / 4)
Q3: 0.75 * n ;--(3 / 4)
Q4: 1.00 * n ;--(4 / 4)
Q3 - Q1 ;--IQR
]
medianFilter: func [v [vector!]
"Median filter"
][
;--use median filter (x: x - med / IRQ)
_v: copy v
med: median _v
IQR: interquartileRange _v
repeat i _v/length [_v/:i: (_v/:i - med) / IQR]
_v
]
A sample
Aucun commentaire:
Enregistrer un commentaire