Does This Look Weird to You?

An Introduction to Anomaly Detection

Kevin Feasel (@feaselkl)
https://csmore.info/on/anomalies

Who Am I? What Am I Doing Here?

Motivation

My goals in this talk:

  • Explain the concept of anomalies.
  • Review techniques for detecting anomalies.
  • Build an anomaly detector in .NET.
  • Use the Azure Cognitive Services anomaly detector.

Agenda

  1. What is an Anomaly?
  2. Data Set Considerations
  3. Techniques for Tracking Anomalies
  4. Building an Anomaly Detector
  5. Using the Azure Cognitive Services Anomaly Detector

Outliers and Anomalies

In the academic literature, there is some ambiguity in the definitions of outliers and anomalies. Some authors mean them to be the same and other authors differentiate the two terms. I will follow the latter practice.

An outlier is something sufficiently different from the norm that we notice it.

An anomaly is an outlier of interest to humans.

Let's dive further into general concepts and technical definitions.

General Concepts

The non-technical definition of an anomaly is essentially “I’ll know it when I see it.” This can get muddled at the edges, but works really well because humans are great at pattern matching and picking out things which look dissimilar.

The Gestalt School

One of the best collections of information about how we process things visually is the Gestalt school of psychology. Their key insight is that our minds apply known and expected patterns to what our eyes see.

This leads to a few key Gestalt principles we can take advantage of.

Law of Closure

We naturally fill in gaps and turn partial shapes into whole shapes.

Law of Common Region

We group things together based on their being inside or outside of a region.

Figure / Ground

We prefer to see the foreground rather than the background. Exceptions do exist, such as Rubin's vase:

Law of Proximity

Things which are nearer to each other are considered part of the same grouping, and "abnormal" separation creates new groups in our minds.

Similarity

We group things together based on color, shape, and size.

Continuity

We want to follow the smoothest path when viewing lines.

Continuity

By contrast, this is a discomforting pattern because it breaks continuity.

Symmetry and Order

We perceive ambiguous shapes in as simple a manner as possible. What is this?

Symmetry and Order

Our minds put together that it's a mixture of multiple, slightly overlapping shapes.

Symmetry and Order

We do this because we've never seen a character looking like this, and so don't think of the complex shape as "one" thing.

Symmetry and Order

By contrast...

How This Applies

Because humans are pattern-matchers who try to apply fairly simple heuristics to visual inputs, we tend to see things that aren’t there. People can take advantage of this with optical illusions, but it also lets us make cogent observations.

How This Applies

Our eyes try to fit a line to the scatterplot and tell us direction and magnitude. And they also make us wonder about those two outliers dragging down our best-fit line.

Technical Definitions

A layman’s concept of anomalies is great, but it is ambiguous. Some things which might look strange actually aren’t anomalous behavior, whereas some anomalies might look reasonable from a first glance.

Outliers on a Fitted Distribution

Outliers on a Box Plot

Outliers on a Process Control Chart

A process control chart gives us an understanding of when a process is working within normal parameter ("in control") and when it escapes those confines and goes "out of control."

Agenda

  1. What is an Anomaly?
  2. Data Set Considerations
  3. Techniques for Tracking Anomalies
  4. Building an Anomaly Detector
  5. Using the Azure Cognitive Services Anomaly Detector

Characteristics of Good Data Sets

When hunting for anomalies, we want data sets which have the following properties:

  • One numeric feature to measure
  • At least 30 data points, and preferably 60-90
  • At least one full cycle of behavior, and preferably more

Is Time Series Data Required?

Betteridge’s Law of Headlines says no.

Time series data is used extremely frequently for tracking anomalies because anomalies tend to be temporal in nature. But you can use the same techniques when looking at cohorts within a given time frame.

Is Time Series Data Required?

This data was all collected in one time period, and yet we can envision a way to detect anomalies.

In this case, the assumption is that all members in a cohort should have the same operation function.

Agenda

  1. What is an Anomaly?
  2. Data Set Considerations
  3. Techniques for Tracking Anomalies
  4. Building an Anomaly Detector
  5. Using the Azure Cognitive Services Anomaly Detector

Intuition on Techniques

There are dozens of anomaly detection techniques available to us. Some commonalities among techniques are:

  • Points clustered near to each other are less likely to be anomalous
  • There tend to be few outliers, so we can isolate those
  • With time series data, point-to-point changes are usually not drastic (given some variance) -- predictability is possible
  • Trends and seasonality may affect analysis -- we need to remove those before performing checks

Standard Deviations

The standard deviation is a calculation of variance in our data.

Standard Deviations

For normal distributions:

  • 68% of values are within 1 standard deviation of the mean
  • 95% of values are within 2 standard deviations of the mean
  • 99.7% of values are within 3 standard deviations of the mean

The Downside to Standard Deviation

Standard deviation is sensitive to outliers.

stdev({7.3, 8.2, 8.4, 9.1, 9.3, 9.6}) = 0.85.

Mean = 8.65

The Downside to Standard Deviation

stdev = 0.85, mean = 8.65.

Now let's add one more datapoint:

stdev({ 7.3, 8.2, 8.4, 9.1, 9.3, 9.6, 1.9}) = 2.67.

Mean = 7.69

One outlier increases standard deviation considerably.

The Downside to Standard Deviation

stdev = 2.67, mean = 7.69.

This also causes us to ignore otherwise-abnormal values like 5.1:

95% = mean +/- (2 * stdev)

Original 95% = 8.65 +/- 2*0.85 = [6.95, 10.35]

New 95% = 7.69 +/- 2*2.67 = [2.35, 13.03]

5.1 was caught by the original model but the new model thinks it's just fine.

Fixing Standard Deviation: MAD

Median Absolute Deviation is a robust statistic: it can handle a limited number of outliers without breaking down.

$MAD = median(|X_i - \widetilde X|)$

Fixing Standard Deviation: MAD

Using the original dataset from before, let's calculate median and MAD.

Eliminate the extremes until you get to the center 1-2 elements.

X = {7.3, 8.2, 8.4, 9.1, 9.3, 9.6}. Median = 8.75

$MAD = median(|X_i - \widetilde X|)$

MAD = med({1.45, 0.55, 0.35, 0.35, 0.55, 0.85}) = 0.55.

Fixing Standard Deviation: MAD

Median = 8.75, MAD = 0.55

Now let's add that outlier:

X2 = {7.3, 8.2, 8.4, 9.1, 9.3, 9.6, 1.9}). Median = 8.4

$MAD = median(|X_i - \widetilde X|)$

MAD = med({0.7, 0, 1.1, 0.2, 0.9, 1.2, 6.5}) = 0.9.

Fixing Standard Deviation: MAD

Old median = 8.75, old MAD = 0.55

New median = 8.4, new MAD = 0.9

3 * MAD is a good rule of thumb. Both of these would catch 5.1 as an outlier value.

8.75 - 3*0.55 = 7.1

8.4 - 3*0.9 = 5.7

Differences from Trend

Suppose we have a trend with an anomalous jump. How do we separate the anomaly increase from the trend?

Differences from Trend

De-trend: fit the data with a line...

Differences from Trend

De-trend: fit the data with a line and track the difference from the line.


Changepoint Detection

Changepoint detection looks for abrupt shifts in time series data.

Differences

Another common technique is to measure the difference between points and perform statistical analysis on those differences.

We can perform all of the same analyses on deltas that we do on raw values.

Agenda

  1. What is an Anomaly?
  2. Data Set Considerations
  3. Techniques for Tracking Anomalies
  4. Building an Anomaly Detector
  5. Using the Azure Cognitive Services Anomaly Detector

Options

Here are a few examples of pre-written packages for anomaly detection:

Rolling Your Own in .NET

If you decide to build your own anomaly detection process, check out MathNet.

MathNet is a series of .NET libraries for numerical and statistical analysis.

This allows you to customize the statistical tests to run and generate results very quickly in C# or F# code.

Example Tests

  • Standard deviations from the mean
  • Median Absolute Deviations (both directions)
  • Unidirectional MAD
  • Deviation from Inter-Quartile Range

Many of these sorts of tests are one-liners with MathNet.Numerics.

ML.NET

Another alternative is to use anomaly detection within the ML.NET package.

ML.NET is an actively-developed library for machine learning within .NET and supports both F# and C#.

ML.NET Setup

Prep steps in Visual Studio Code or at the command line:

Demo Time

Agenda

  1. What is an Anomaly?
  2. Data Set Considerations
  3. Techniques for Tracking Anomalies
  4. Building an Anomaly Detector
  5. Using the Azure Cognitive Services Anomaly Detector

Azure Cognitive Services Anomaly Detector API

The Azure Cognitive Services Anomaly Detector API allows you to perform anomaly detection from any language which supports hitting REST APIs.

Steps:

  1. Create an Anomaly Detector resource in Azure
  2. Save the Anomaly Detector key and endpoint somewhere like environment variables
  3. Create an application and work with the API

Anomaly Detector API Setup

Anomaly Detector API Setup

Demonstrating the Anomaly Detector API

Anomaly Detector demo

Demo Time

Working with the API in .NET

Although we have libraries like ML.NET which provide anomaly detection, you can also use the Anomaly Detection API in your C# or F# code.

Cognitive Services Setup

Prep steps in Visual Studio Code or at the command line:

Demo Time

Wrapping Up

Over the course of this talk, we have looked at the concept of anomalies, some techniques for detecting them, and .NET packages to make it easy.

Wrapping Up

To learn more, go here:
https://csmore.info/on/anomalies


And for help, contact me:
feasel@catallaxyservices.com | @feaselkl


Catallaxy Services consulting:
https://CSmore.info/on/contact