Genetics In Action: Evolutionary Algorithms

Kevin Feasel (@feaselkl)

http://CSmore.info/on/gia

Who Am I? What Am I Doing Here?

Catallaxy Services
@feaselkl
Curated SQL
We Speak Linux

Evolutionary Algorithms

There are four major evolutionary algorithms:

  • Genetic Algorithms
  • Genetic Programming
  • Evolutionary Programming
  • Gene Expression Programming

In today's talk, we will cover the first two.

What Are Evolutionary Algorithms?

Evolutionary algorithms are great at solving a particular type of problem:

  1. Gigantic search space
  2. Known way to score answers
  3. The solution is too complex to do by hand
  4. We expect that known tools can achieve a solution

Evolutionary algorithms also perform extremely well when the environment (and thus the best solution) changes regularly.

Good Applications

Good uses for Evolutionary Algorithms: NP-Hard problems.

  • Optimization problems: min/max, circuit layout, Traveling Salesman Problem, Knapsack Problem
  • Machine learning: building weights for neural nets, rules for classifiers, or training sensors
  • Economic models: bidding strategies (e.g., portfolio bidding), game theory
  • Ecological models: host-parasite co-evolution, symbiosis, resource flow

Evolutionary Algorithms help solve min-max problems and can avoid hill-climbing issues.

What Are Evolutionary Algorithms?

Evolutionary Algorithms tend to solve hill-climbing problems:

Many algorithms tend to get stuck at local maxima/minima; EAs often don't.

Heuristics For Good Solutions

There are several good solution heuristics:

  1. Correctness – We can find the a priori correct solution.
  2. Consistency – Repeat the problem, repeat the answer.
  3. Justifiability – Cause implies effect.
  4. Certainty – Chance should not be part of the solution.
  5. Orderliness – The process is a logical, step-by-step process in which each step leads to the next.
  6. Parsimony – Simpler is ceteris paribus better.
  7. Decisiveness – Once a solution is found, the problem is solved.

Evolutionary algorithms guarantee zero of these properties.

What Are The Limitations?

Going further, evolutionary algorithms have a number of limitations:

  • No guarantee of a solution
  • No guarantee of good performance in reasonable time
  • Answers will likely differ each run
  • Often hard to tell when to stop
  • Can still get stuck at a local min/max
  • Solutions are not parsimonious

But...they do tend to get the job done.

Agenda

  1. Genetics Crash Course
  2. Genetic Algorithms
  3. Genetic Programming

Genetics Crash Course

Evolutionary Algorithms are modeled after simplified forms of biological processes.

Creative Commons. Author: John Gould

Genetics Crash Course

Organisms have chromosomes, whose purpose is to carry genes. In Evolutionary Algorithms, "organisms" (candidate solutions) typically have one chromosome.

Creative Commons. Author: Merlin G. Butler, et al.

Genetics Crash Course

Genes make up DNA sequences called genotypes. Each gene has a number of alternative forms, called alleles.

Creative Commons. Courtesy: National Human Genome Research Institute.

Genotypes are possible genetic structures. Given 2 alleles for each of 32 genes, we would have 2^32 or 4,294,967,296 genotypes.

Genetics Crash Course

Phenotypes are observable physical characteristics based on specific genotypes. This might be coloration, height, size, or beak structure.

Creative Commons. Author: Madeline Price Ball

Genetics Crash Course

BEWARE: there is not a 1:1 correspondence between genotypes and phenotypes, as phenotypes often depend upon a specific combination of alleles. When even one allele is missing, the desired effect may not appear. Close doesn't count with genetics.

Environmental Niche

The environmental niche is a set of features which certain phenotypes might be able to exploit. Example: thicker fur for northern climes.

Creative Commons. Author: Julio Reis

Agenda

  1. Genetics Crash Course
  2. Genetic Algorithms
  3. Genetic Programming

Genetic Algorithms

Genetic algorithms were popularized with John Holland's work on the topic, particularly Adaptation in Natural and Artificial Systems.

Genetic Algorithms

Demo Time

Genetic Algorithms

Applying terms:

  • Organism = candidate solution
  • Chromosome = solution array
  • Gene = element of the solution array
  • Locus = position of element in the array
  • Allele = specific value for a gene
  • Environmental niche = fitness function

Genetic Algorithms

Most simple genetic algorithms are arrays of genes with binary alleles:

We build up a population of organisms, large enough to ensure genetic diversity.

Genetic Algorithms

After building the population, we score each chromosome using the same fitness function:

Now each chromosome has its own score and can compare to other chromosomes.

Genetic Algorithms

For each organism (or pair of organisms) in the next generation, determine two "parents" based on some algorithm. A simple algorithm is roulette wheel, where we pull based on percentage fitness.

Genetic Algorithms

We then apply crossover with some probability. If that RNG pull succeeds, we cross over at some starting and ending point (also RNG determined):

Typically, there is one crossover range per organism pair. Sometimes, the crossover check fails and we keep the two parents. Usually we perform crossover with p = [0.6, 0.8].

Genetic Algorithms

For each gene in each new organism, we apply some mutation operation:

Mutation is a tool to help us get out of local maxima, but too much mutation and you have random search. Usually we perform mutation with p = [0.0001, 0.01]

Genetic Algorithms

Now we have a new organism, which is hopefully better than the old organisms in terms of solving our fitness function:

Repeat for each organism in the next generation, and repeat the process.

Genetic Algorithms

GA In Action

Great problems for genetic algorithms include the knapsack problem, portfolio selection, the traveling salesman problem, and the Holyfield problem.

Genetic Algorithms

Holyfield Problem

The Holyfield Problem comes from Evander Holyfield's Real Deal Boxing for the Sega Genesis.

Genetic Algorithms

Holyfield Problem

You have several options to help make your boxer the best.

Genetic Algorithms

Demo - Holyfield Problem

Genetic Algorithms

Holyfield Problem

Result: genetic algorithms let us choose the best set of options, making our boxer the best there is.

Genetic Algorithms

Portfolio Selection

We want to maximize returns for a portfolio while minimizing risk.

Genetic Algorithms

Demo - Portfolio Selection

Genetic Algorithms

Traveling Salesman

We want to minimize total travel distance while hitting every city exactly once.

Genetic Algorithms

Demo - Traveling Salesman

Genetic Algorithms

Takeaways

  • Solutions exist in multiple languages, including R, .NET, and Java.
  • Genetic algorithms cut through the search space, tending to start off unfocused and honing in on a solution over the course of generations.
  • Genetic algorithms are powerful in part because they do not assume anything about what good genotypes look like, except for what the fitness function tells us.

Agenda

  1. Genetics Crash Course
  2. Genetic Algorithms
  3. Genetic Programming

Genetic Programming

John Koza popularized Genetic Programming with his eponymous series of books.

Genetic Programming

Demo Time

Genetic Programming

Genetic programming is an extension of genetic algorithm concepts. Genetic algorithms feature fixed-sized chromosomes with data-representational alleles.

Genetic programming takes the mechanisms of genetic algorithms and applies it to a higher-level problem.

Genetic Programming

Similarities with genetic algorithms:

  • Populations of candidate solutions
  • Fitness functions
  • New generations of candidate solutions
  • Crossover and mutation operations

But there's one major difference: the organisms/chromosomes are entire programs.

Genetic Programming

Suppose we want to solve a problem where the fitness function is (56 - x)^2. One candidate solution:

Genetic Programming

In practice, genetic programs are significantly less parsimonious.

Genetic Programming

Mathematical Formulas

Genetic programs are great for deriving mathematical relationships.

Genetic Programming

Demo - Mathematical Formula

Genetic Programming

Mathematical Formula

Genetic programs can help us derive non-linear function results, similar to a regression model. If you are regressing, beware that genetic programs will overtrain on the current data; they're much more useful for identifying formulas.

Genetic Programming

DBA Salaries

Another use for genetic programming is building regression trees, which use decision tree mechanics but solve the same problem as a regression model: predicting the dependent variable's value.

Genetic Programming

Demo - DBA Salaries

Genetic Programming

DBA Salaries

Our regression tree was not better than a linear regression based on the attributes we selected, but was more understandable to a layman. In other cases, we can end up with better results.

Genetic Programming

Time series prediction

Related to mathematical formulas, we can generate predictions for time series data.

Genetic Programming

Demo - Time series prediction

Genetic Programming

Time series prediction

How many points we have available and how many points we want to predict will determine how closely our genetic program's predictions can hew to actual data.

Genetic Programming

Takeaways

  • Genetic programs can find arbitrary-length solutions to problems.
  • The best library is ECJ (in Java); rgp and Accord.NET are okay but not great.
  • Genetic programming can also generate decision trees.

Wrapping Up

Evolutionary algorithms, including genetic algorithms and genetic programming, offer up an interesting approach to navigating large hill-climbing type search spaces with adequate efficiency and relatively high likelihood of avoiding local minima/maxima.

To learn more, go here: http://CSmore.info/on/gia

And for help, contact me: feasel@catallaxyservices.com | @feaselkl