Now You Have Two Problems

An Introduction to Regular Expressions in SQL Server 2025

Kevin Feasel (@feaselkl)
https://csmore.info/on/regex

Who Am I? What Am I Doing Here?

Motivation

SQL Server has historically had limited pattern matching capabilities, relying on LIKE and PATINDEX() for string matching.

SQL Server 2025 introduces native regular expression support, bringing SQL Server in line with other major database platforms like Oracle, PostgreSQL, and MySQL.

Regular expressions unlock powerful text processing capabilities that were previously difficult or impossible in T-SQL alone.

Agenda

  1. What Are Regular Expressions?
  2. Regex Fundamentals
  3. REGEXP_LIKE
  4. REGEXP_COUNT
  5. REGEXP_REPLACE
  6. REGEXP_SUBSTR
  7. REGEXP_INSTR
  8. Practical Examples

What Are Regular Expressions?

A regular expression (regex) is a sequence of characters that defines a search pattern.

Think of them as a super-powered version of the LIKE operator.

The Famous Quotation

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

-- Jamie Zawinski

Let's see if that's still true.

LIKE vs Regular Expressions

LIKE gives us two wildcards:

  • % -- Zero or more characters
  • _ -- Exactly one character

Regular expressions give us much more control over pattern matching, including character classes, quantifiers, alternation, grouping, anchors, and more.

Use LIKE when simple prefix/suffix/contains matching is enough. It can use indexes and is faster for simple patterns. Use regex when you need character classes, quantifiers, alternation, or complex pattern logic.

Agenda

  1. What Are Regular Expressions?
  2. Regex Fundamentals
  3. REGEXP_LIKE
  4. REGEXP_COUNT
  5. REGEXP_REPLACE
  6. REGEXP_SUBSTR
  7. REGEXP_INSTR
  8. Practical Examples

Literal Characters

The simplest regex pattern is a literal string.

The pattern cat matches the text "cat" inside a string. This is equivalent to LIKE '%cat%'.

Regular expressions are case-sensitive by default, though SQL Server's regex functions include a flag to change this.

Character Classes

Square brackets define a set of characters to match:

  • [abc] -- Matches a, b, or c
  • [a-z] -- Matches any lowercase letter
  • [0-9] -- Matches any digit
  • [^abc] -- Matches any character except a, b, or c

Shorthand Character Classes

  • \d -- Any digit (equivalent to [0-9])
  • \D -- Any non-digit
  • \w -- Any word character (letters, digits, underscore)
  • \W -- Any non-word character
  • \s -- Any whitespace character
  • \S -- Any non-whitespace character
  • . -- Any character (except newline by default)

Quantifiers

Quantifiers specify how many times a character or group should repeat:

  • * -- Zero or more times
  • + -- One or more times
  • ? -- Zero or one time
  • {n} -- Exactly n times
  • {n,} -- n or more times
  • {n,m} -- Between n and m times

Greedy vs. Lazy Quantifiers

By default, quantifiers are greedy: they match as much as possible.

Adding ? after a quantifier makes it lazy: it matches as little as possible.

  • <.+> on <b>bold</b> matches <b>bold</b> (greedy: everything between the first < and the last >)
  • <.+?> on the same string matches <b> (lazy: stops at the first >)

This is the #1 source of "my regex matches too much!" bugs.

Anchors and Boundaries

  • ^ -- Start of string
  • $ -- End of string
  • \b -- Word boundary

For example, ^[A-Z] matches a string that starts with a capital letter, while [0-9]$ matches a string that ends with a digit.

Grouping and Alternation

  • (abc) -- Captures the group "abc"
  • (?:abc) -- Groups without capturing (non-capturing group)
  • a|b -- Matches a or b (alternation)
  • (cat|dog) -- Matches "cat" or "dog"

Captured groups can be referenced in replacement patterns using \1, \2, etc. Use (?:...) when you need grouping but don't need to capture.

Escaping Special Characters

If you want to match a character that has special meaning in regex, escape it with a backslash:

\. matches a literal period, \* matches a literal asterisk, \\ matches a literal backslash.

Special characters that need escaping: . * + ? ^ $ { } [ ] ( ) | \

Demo Time

Agenda

  1. What Are Regular Expressions?
  2. Regex Fundamentals
  3. REGEXP_LIKE
  4. REGEXP_COUNT
  5. REGEXP_REPLACE
  6. REGEXP_SUBSTR
  7. REGEXP_INSTR
  8. Practical Examples

REGEXP_LIKE

REGEXP_LIKE tests whether a string matches a regular expression pattern. It returns 1 (match) or 0 (no match).


						REGEXP_LIKE(
							source_string,
							pattern
							[, flags]
						)
					

Think of this as a more powerful replacement for LIKE or PATINDEX() > 0.

REGEXP_LIKE Flags

The optional flags parameter modifies matching behavior:

  • 'i' -- Case-insensitive matching
  • 'm' -- Multi-line mode (^ and $ match line boundaries)

These flags are common to all SQL Server 2025 regex functions.

Demo Time

Agenda

  1. What Are Regular Expressions?
  2. Regex Fundamentals
  3. REGEXP_LIKE
  4. REGEXP_COUNT
  5. REGEXP_REPLACE
  6. REGEXP_SUBSTR
  7. REGEXP_INSTR
  8. Practical Examples

REGEXP_COUNT

REGEXP_COUNT returns the number of times a pattern matches within a string.


						REGEXP_COUNT(
							source_string,
							pattern
							[, start_position
							[, flags]]
						)
					

This is useful for counting occurrences of a pattern, such as the number of words or digits in a string.

Demo Time

Agenda

  1. What Are Regular Expressions?
  2. Regex Fundamentals
  3. REGEXP_LIKE
  4. REGEXP_COUNT
  5. REGEXP_REPLACE
  6. REGEXP_SUBSTR
  7. REGEXP_INSTR
  8. Practical Examples

REGEXP_REPLACE

REGEXP_REPLACE replaces occurrences of a pattern within a string.


						REGEXP_REPLACE(
							source_string,
							pattern,
							replacement
							[, start_position
							[, occurrence
							[, flags]]]
						)
					

This is a major upgrade from REPLACE(), which only handles literal string replacement.

Backreferences in Replacement

Use captured groups in your replacement string:

  • \1, \2, etc. refer to captured groups
  • Allows you to rearrange, reformat, or selectively keep parts of the matched text

For example, (\d{3})-(\d{4}) with replacement \2-\1 swaps the two groups.

Lookaheads and Lookbehinds

These check for patterns without consuming them:

  • (?=...) -- Positive lookahead: what follows must match
  • (?!...) -- Negative lookahead: what follows must NOT match
  • (?<=...) -- Positive lookbehind: what precedes must match
  • (?<!...) -- Negative lookbehind: what precedes must NOT match

We will demo the positive versions today. Useful for redacting data, conditional replacements, and context-sensitive matching.

Demo Time

Agenda

  1. What Are Regular Expressions?
  2. Regex Fundamentals
  3. REGEXP_LIKE
  4. REGEXP_COUNT
  5. REGEXP_REPLACE
  6. REGEXP_SUBSTR
  7. REGEXP_INSTR
  8. Practical Examples

REGEXP_SUBSTR

REGEXP_SUBSTR extracts a substring that matches a regular expression pattern.


						REGEXP_SUBSTR(
							source_string,
							pattern
							[, start_position
							[, occurrence
							[, flags
							[, group_number]]]]
						)
					

This is especially useful for extracting structured data from unstructured text.

Demo Time

Agenda

  1. What Are Regular Expressions?
  2. Regex Fundamentals
  3. REGEXP_LIKE
  4. REGEXP_COUNT
  5. REGEXP_REPLACE
  6. REGEXP_SUBSTR
  7. REGEXP_INSTR
  8. Practical Examples

REGEXP_INSTR

REGEXP_INSTR returns the starting position of a pattern match within a string. Think of it as a regex-powered PATINDEX().


						REGEXP_INSTR(
							source_string,
							pattern
							[, start_position
							[, occurrence
							[, return_option
							[, flags
							[, group_number]]]]]
						)
					

The return_option parameter controls whether to return the start (0) or end (1) of the match.

Demo Time

Agenda

  1. What Are Regular Expressions?
  2. Regex Fundamentals
  3. REGEXP_LIKE
  4. REGEXP_COUNT
  5. REGEXP_REPLACE
  6. REGEXP_SUBSTR
  7. REGEXP_INSTR
  8. Practical Examples

Unicode Property Escapes

SQL Server 2025 uses the ICU regex engine, which supports Unicode property escapes. This is something LIKE and PATINDEX cannot do.

  • \p{L} -- Any Unicode letter (Latin, Cyrillic, Chinese, Arabic, etc.)
  • \p{N} -- Any Unicode number
  • \p{Sc} -- Any currency symbol ($, €, £, ¥)
  • \p{P} -- Any punctuation character

Essential for working with multilingual or international data.

Practical Examples

Now let's put it all together with some real-world scenarios:

  • Data validation (e-mail addresses, phone numbers)
  • Parsing semi-structured log data
  • Data cleansing and standardization pipelines
  • Unicode-aware text processing
  • Extracting structured data from free text

Demo Time

Performance Considerations

Regular expressions are powerful but come with caveats:

  • Regex functions in a WHERE clause might not use indexes. Some require a full scan, just like LIKE '%pattern%'
  • For simple prefix matching, LIKE 'prefix%' is still faster and can use indexes
  • Filter with indexed columns first, then apply regex for validation or extraction
  • Complex patterns with excessive backtracking can be slow on large strings

Regex Prior to SQL Server 2025

If you want to use regular expressions in SQL Server before 2025, you would typically rely on CLR functions or third-party libraries.

One outstanding CLR-based library is SQL#, which provides a comprehensive set of regex functions for SQL Server.

The free version includes most Regex functionality and the paid edition adds capture groups.

Wrapping Up

SQL Server 2025 brings long-awaited regular expression support to the T-SQL language. The five new functions give database developers powerful text processing capabilities without leaving T-SQL.

Key takeaways:

  • SQL Server 2025 uses the ICU regex engine with Unicode property support
  • Watch out for greedy vs. lazy quantifier behavior
  • Use lookaheads/lookbehinds for context-sensitive matching
  • Use \p{...} for Unicode-aware international data

Wrapping Up

To learn more, go here:
https://csmore.info/on/regex


And for help, contact me:
feasel@catallaxyservices.com | @feaselkl


Catallaxy Services consulting:
https://CSmore.info/on/contact