Does This Look Weird to You?

An Introduction to Anomaly Detection

Kevin Feasel (@feaselkl)
https://csmore.info/on/anomalies

Who Am I? What Am I Doing Here?

Motivation

My goals in this talk:

  • Explain the concept of anomalies.
  • Review techniques for detecting anomalies.
  • Briefly demonstrate an anomaly detector in .NET.

Agenda

  1. What is an Anomaly?
  2. Techniques for Tracking Anomalies
  3. Demonstrating an Anomaly Detector

Outliers and Anomalies

In the academic literature, there is some ambiguity in the definitions of outliers and anomalies. Some authors mean them to be the same and other authors differentiate the two terms. I will follow the latter practice.

An outlier is something sufficiently different from the norm that we notice it.

An anomaly is an outlier of interest to humans.

Let's dive further into general concepts and technical definitions.

General Concepts

The non-technical definition of an anomaly is essentially “I’ll know it when I see it.” This can get muddled at the edges, but works really well because humans are great at pattern matching and picking out things which look dissimilar.

The Gestalt School

One of the best collections of information about how we process things visually is the Gestalt school of psychology. Their key insight is that our minds apply known and expected patterns to what our eyes see.

This leads to a few key Gestalt principles we can take advantage of.

Law of Closure

We naturally fill in gaps and turn partial shapes into whole shapes.

Law of Common Region

We group things together based on their being inside or outside of a region.

Figure / Ground

We prefer to see the foreground rather than the background. Exceptions do exist, such as Rubin's vase:

Law of Proximity

Things which are nearer to each other are considered part of the same grouping, and "abnormal" separation creates new groups in our minds.

Similarity

We group things together based on color, shape, and size.

Continuity

We want to follow the smoothest path when viewing lines.

Continuity

By contrast, this is a discomforting pattern because it breaks continuity.

Symmetry and Order

We perceive ambiguous shapes in as simple a manner as possible. What is this?

Symmetry and Order

Our minds put together that it's a mixture of multiple, slightly overlapping shapes.

Symmetry and Order

We do this because we've never seen a character looking like this, and so don't think of the complex shape as "one" thing.

Symmetry and Order

By contrast...

How This Applies

Because humans are pattern-matchers who try to apply fairly simple heuristics to visual inputs, we tend to see things that aren’t there. People can take advantage of this with optical illusions, but it also lets us make cogent observations.

How This Applies

Our eyes try to fit a line to the scatterplot and tell us direction and magnitude. And they also make us wonder about those two outliers dragging down our best-fit line.

Technical Definitions

A layman’s concept of anomalies is great, but it is ambiguous. Some things which might look strange actually aren’t anomalous behavior, whereas some anomalies might look reasonable from a first glance.

Outliers on a Fitted Distribution

Outliers on a Box Plot

Agenda

  1. What is an Anomaly?
  2. Techniques for Tracking Anomalies
  3. Demonstrating an Anomaly Detector

Intuition on Techniques

There are dozens of anomaly detection techniques available to us. Some commonalities among techniques are:

  • Points clustered near to each other are less likely to be anomalous
  • There tend to be few outliers, so we can isolate those
  • With time series data, point-to-point changes are usually not drastic (given some variance) -- predictability is possible
  • Trends and seasonality may affect analysis -- we need to remove those before performing checks

Standard Deviations

The standard deviation is a calculation of variance in our data.

Standard Deviations

For normal distributions:

  • 68% of values are within 1 standard deviation of the mean
  • 95% of values are within 2 standard deviations of the mean
  • 99.7% of values are within 3 standard deviations of the mean

The Downside to Standard Deviation

Standard deviation is sensitive to outliers. With each example of an outlier, our standard deviation increases.

With a few outlier data points, we can raise the standard deviation so much that it loses most of its predictive value for catching outliers.

Fixing Standard Deviation: MAD

Median Absolute Deviation is a robust statistic: it can handle a limited number of outliers without breaking down.

Even better, outliers in opposite directions cancel each other out.

Differences from Trend

Suppose we have a trend with an anomalous jump. How do we separate the anomaly increase from the trend?

Differences from Trend

De-trend: fit the data with a line...

Differences from Trend

De-trend: fit the data with a line and track the difference from the line.


Changepoint Detection

Changepoint detection looks for abrupt shifts in time series data.

Differences

Another common technique is to measure the difference between points and perform statistical analysis on those differences.

We can perform all of the same analyses on deltas that we do on raw values.

Agenda

  1. What is an Anomaly?
  2. Techniques for Tracking Anomalies
  3. Demonstrating an Anomaly Detector

Options

Here are a few examples of pre-written packages for anomaly detection:

Rolling Your Own in .NET

If you decide to build your own anomaly detection process, check out MathNet.

MathNet is a series of .NET libraries for numerical and statistical analysis.

This allows you to customize the statistical tests to run and generate results very quickly in C# or F# code.

Example Tests

  • Standard deviations from the mean
  • Median Absolute Deviations (both directions)
  • Unidirectional MAD
  • Deviation from Inter-Quartile Range

Many of these sorts of tests are one-liners with MathNet.Numerics.

ML.NET

Another alternative is to use anomaly detection within the ML.NET package.

ML.NET is an actively-developed library for machine learning within .NET and supports both F# and C#.

ML.NET Setup

Prep steps in Visual Studio Code or at the command line:

Demo Time

Wrapping Up

Over the course of this talk, we have looked at the concept of anomalies, some techniques for detecting them, and .NET packages to make it easy.

Wrapping Up

To learn more, go here:
https://csmore.info/on/anomalies


And for help, contact me:
feasel@catallaxyservices.com | @feaselkl


Catallaxy Services consulting:
https://CSmore.info/on/contact