Standard Deviation

Published: 2020-04-18T17:30:00.000Z

When looking at a dataset of ranked values one question you might ask yourself is; how spread out are these values?

You might have a dataset with an average of 0 where all values are close to 0. Like this set {1, 1, 2, -2, -1, -1}. But just looking at the average the set might as well look like this {100, 100, -100, -100}, but we wouldn't know. That's the question that "Standard Deviation" answers.

To get the standard deviation you square the distance of each value to the average. Then take the sum of those values multiply it by one divided by the length of the set. In the end take the square root of that and you're done.

The formula looks like this:

mean := sum(set) / length(set)
sqrt(
  sum(set.fold(|e| abs(e - mean)) ^ 2)
  * (1 / length(set))
)

The Standard Deviation changes quickly if there are a few outliers in the dataset. This is often not what you want. A more "robust" (less sensitive to outliers) alternative is the median absolute deviation from the median (MAD for short). I also found it a lot more intuitive than the standard deviation.

Some pseudo-code:

set_median := median(set)
median(set.map(|x| abs(x - set_median)))