T-SQL Anti-Patterns

Created by Kevin Feasel

Who Am I? What Am I Doing Here?

Catallaxy Services
@feaselkl
Curated SQL
We Speak Linux

Code Smells and Anti-Patterns

Code smells and anti-patterns are related topics and will make up our discussion today.

  • Code Smell
  • Anti-Pattern

A code smell is an intuitive feeling that a piece of code has fundamental problems. We develop a sense for code smells ("code nose") over time as we create and observe patterns of abuse.

Code Smells and Anti-Patterns

Code smells and anti-patterns are related topics and will make up our discussion today.

  • Code Smell
  • Anti-Pattern

An anti-pattern is a common technique we use to solve a recurring problem...incorrectly.

Anti-patterns often have a code smell. Sometimes, we can use unrelated code smells to dig out deeper anti-patterns.

General Themes

  1. Style
  2. Theory
  3. Performance

“There are more anti-patterns in heaven and earth, Horatio, than are dreamt of in your philosophy.” -- Hamlet, DBA.

General Caveats

Some anti-patterns fit in multiple categories, and no single presentation can cover them all.

Even better, an anti-pattern in one scenario might actually be a reasonable business decision in another scenario.

In case that weren't enough, some anti-patterns may appear to be contradictory; resolution requires work.

ANSI-89 Joins

Description:

This code smell is not necessarily a problem, but typically, when you see this syntax, the likelihood is great that the author will have all kinds of "I needed it in the '90s" hacks.

ANSI-89 Joins

Code-based example:


SELECT
    a.Col1,
    b.Col2
FROM
    dbo.TableA a,
    dbo.TableB b
WHERE
    a.JoinColumn = b.JoinColumn;
						

ANSI-89 Joins

Why this is an anti-pattern:

This code smell has three problems:

  1. When seen in "modern" code (no later than the beginning of this millennium), this indicates that there might be a lack of newer constructs to fix old SQL problems.
  2. Join criteria get combined with explicit filters
  3. Accidental Cartesian products are too easy

ANSI-89 Joins

Why this is an anti-pattern: Example 2


SELECT
    a.Col1,
    b.Col2
FROM
    dbo.TableA a,
    dbo.TableB b,
    dbo.TableC c
WHERE
    a.JoinColumn = b.JoinColumn
    AND a.ColX > 300
    AND c.ColY < 50;
						

ANSI-89 Joins

How to fix the anti-pattern:

Use ANSI-92 syntax. Replace the prior query with:


SELECT
    a.Col1,
    b.Col2
FROM
    dbo.TableA a
    INNER JOIN dbo.TableB b ON a.JoinColumn = b.JoinColumn
    INNER JOIN dbo.TableC c ON a.JoinColumn2 = c.JoinColumn2
WHERE
    a.ColX > 300
    AND c.ColY < 50;
						

ANSI-89 Joins

When it might not be an anti-pattern:

Never. At least not if you're supporting SQL Server 6.0 or later. And if you're still supporting SQL Server prior to 6.0, you're a curator, not a DBA.

SELECT *

Description:

The simplest SELECT query can also be the most dangerous.

SELECT *

Code-based example:


SELECT
    *
FROM
    dbo.TableA a
    INNER JOIN dbo.TableB b ON a.JoinColumn = b.JoinColumn
    INNER JOIN dbo.TableC c ON a.JoinColumn2 = c.JoinColumn2
WHERE
    a.ColX > 300
    AND c.ColY < 50;
						

SELECT *

Why this is an anti-pattern:

  1. Column order dependency. What happens if the underlying table changes?
  2. Likely returns unnecessary columns
  3. Covering indexes more difficult

SELECT *

How to fix the anti-pattern:

Explicitly name the columns you want.


SELECT
    a.Col1,
    b.Col2
FROM
    dbo.TableA a
    INNER JOIN dbo.TableB b ON a.JoinColumn = b.JoinColumn
    INNER JOIN dbo.TableC c ON a.JoinColumn2 = c.JoinColumn2
WHERE
    a.ColX > 300
    AND c.ColY < 50;
						

Too lazy and need most/all of the columns? Find a tool which does this for you. Or write an ad hoc query against sys.columns.

SELECT *

When it might not be an anti-pattern:

  1. Ad hoc queries (code spelunking or diagnostics)
  2. SELECT INTO to copy a table...maybe
  3. Grabbing all columns from a CTE, sub-query, or APPLY operation...maybe

Zombie Fields

Description:

Re-using old fields for some purpose other than their intention.

Typically, you see this with very old code or third-party platforms where you can't change the schema...or when the developer is just being lazy.

Zombie Fields

Code-based example:


INSERT INTO dbo.Address
(
	Line1,
	Line2,
	Line3,
	City,
	State,
	ZIPCode
)
VALUES
(
	@Line1,
	@Line2,
	@Region,	--Put region in Line 3
	@City,
	@State,
	@ZIPCode
);
						

Zombie Fields

Why this is an anti-pattern:

  1. Inherently counter-intuitive
  2. Makes data retrieval more difficult--usually not as cut-and-dried as a simple field replacement.
  3. Lose constraints

Zombie Fields

How to fix the anti-pattern:


ALTER TABLE dbo.Address ADD COLUMN Region VARCHAR(12) NULL;
UPDATE dbo.Address
SET Region = Line3,
	Line3 = NULL;
						

Use attributes for their intended purposes.

Keep up on changes. Database changes are more expensive than code changes, but this isn't the 1990s anymore--there are tools which can help you.

Zombie Fields

When it might not be an anti-pattern:

When you have no control over the schema: third-party software you use but the company went out of business.

Unconstrained Data

Description:

Tables with no:

  • Primary keys
  • Alternate keys
  • Foreign keys
  • Check constraints
  • Default constraints
  • Appropriate data types (i.e., VARCHAR(50) for everyone!)

Unconstrained Data

Code-based example:


CREATE TABLE A
(
    Column1 VARCHAR(50) NOT NULL,
    Column2 VARCHAR(50) NOT NULL,
    Column3 VARCHAR(50) NOT NULL,
    Column4 VARCHAR(50) NOT NULL
);
 
CREATE TABLE B
(
    Column2 VARCHAR(50) NOT NULL,
    ColumnA VARCHAR(50) NOT NULL,
    ColumnB VARCHAR(50) NOT NULL
);
 
CREATE TABLE C
(
    Column3 VARCHAR(50) NOT NULL,
    ColumnB VARCHAR(50) NOT NULL,
    ColumnZ VARCHAR(50) NOT NULL
);
						

Unconstrained Data

Why this is an anti-pattern:

Thinking about the code sample...

  • No primary keys → Could have duplicate rows, fouling up data integrity or analyses.
  • No alternate keys → Duplicate rows could creep in.
  • No foreign keys → Does A.Column2 join to B.Column2?
  • No check constraints → If B.ColumnB should be one of (1, 2, 3), we should control that.
  • No appropriate data types → B.ColumnB should be a tinyint.
  • No default constraints → App needs to pass in everything.

Unconstrained Data

How to fix the anti-pattern:


CREATE TABLE A
(
    Column1 INT IDENTITY(1,1) NOT NULL,
    Column2 TINYINT NOT NULL,
    Column3 DECIMAL(9,2) NOT NULL,
    Column4 CHAR(12) NOT NULL
);
 
ALTER TABLE A ADD CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED(Column1);
ALTER TABLE A ADD CONSTRAINT [UKC_A] UNIQUE(Column2, Column4);
ALTER TABLE A ADD CONSTRAINT [CK_Column3] CHECK (Column3 >= 0);
ALTER TABLE A ADD CONSTRAINT [CK_Column4] 
    CHECK (LEFT(Column4, 3) = 'RPT' AND RIGHT(Column4, 3) = 'USR');
 
CREATE TABLE B
(
    Column2 TINYINT NOT NULL,
    ColumnA SMALLINT NOT NULL,
    ColumnB VARCHAR(30) NOT NULL
);
 
ALTER TABLE B ADD CONSTRAINT [PK_B] 
    PRIMARY KEY CLUSTERED(Column2);
 
CREATE TABLE C
(
    Column3 VARCHAR(50) NOT NULL,
    ColumnB CHAR(2) NOT NULL,
    ColumnZ DATETIME2(0) NOT NULL
);
 
ALTER TABLE C ADD CONSTRAINT [PK_C] 
    PRIMARY KEY CLUSTERED(Column3, ColumnB);
ALTER TABLE C ADD CONSTRAINT [DF_ColumnZ] 
    DEFAULT(CURRENT_TIMESTAMP) FOR ColumnZ;
 
ALTER TABLE A ADD CONSTRAINT [FK_B_A] 
    FOREIGN KEY(Column2) REFERENCES [B](Column2);
						

Unconstrained Data

When it might not be an anti-pattern:

Throwing together a quick demo, perhaps...but probably not even then.

Lack Of Normalization

Description:

“Normalize until it hurts; denormalize until it works.”

The bad advice which launched (many more than) a thousand crappy databases.

This is the DBA equivalent of "Bleedings cure diseases."

Lack Of Normalization

Lack of normalization examples:

  1. Non-atomic attributes (e.g., comma-separated list of telephone numbers in one column)
  2. Multi-valued attributes (e.g., a table with ID, Name, PhoneNumber1, PhoneNumber2, PhoneNumber3, etc.)
  3. Repeated columns (e.g., a table with ID, Name, ChemistryScore, EnglishScore, MathScore, ReadingScore, etc.)

Lack Of Normalization

Why this is an anti-pattern:

  • Throw away a lot of the value of a relational database model
  • Make it easy to introduce bad data
  • Can degrade performance(!)
  • Can make data access more difficult(!)

Lack Of Normalization

How to fix the anti-pattern:

Grab a good book and refactor that database. Start with Pro SQL Server 2012 Relational Database Design and Implementation by Louis Davidson and Jessica Moss.

Lack Of Normalization

When it might not be an anti-pattern:

Appropriately-designed, selective denormalization to solve specific pressing issues can be okay. Document the deviation and explain why. Test first to make sure you really need this.

Leeching can be helpful sometimes; make sure you know when.

Entity-Attribute-Value

Description:

Looking for a "clockwork" system: create once and let end users use and (under the covers) maintain.

This is hard for databases: table changes are expensive and can be difficult to do right.

Stroke of genius: One True Lookup Table. If users want Color, they get Color; if they want Size, they get Size. No DBAs needed to slow down the process.

Entity-Attribute-Value

Code-based example:


CREATE TABLE dbo.ItemAttribute
(
    ItemAttributeID int identity(1,1) NOT NULL,
    ItemID int NOT NULL,
    Name varchar(100) NOT NULL,
    Value varchar(max) NOT NULL
);
ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [PK_ItemAttribute] 
    PRIMARY KEY CLUSTERED(ID);
ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [UKC_ItemAttribute] 
    UNIQUE (ItemID, Name);
ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [FK_ItemAttribute_Item] 
    FOREIGN KEY (ItemID) REFERENCES dbo.Item(ItemID);
						

Entity-Attribute-Value

Why this is an anti-pattern:

Inserting is easy; retrieval is a pain.


SELECT
    i.ID as ItemID,
    i.Name as ItemName,
    COALESCE(iam.Value, 'UNKNOWN') as ManufacturerName,
    COALESCE(iac.Value, 'UNKNOWN') as ItemColor,
    COALESCE(ias.Value, 'UNKNOWN') as ProductSize,
    COALESCE(iao.Value, 'UNKNOWN') as CountryOfOrigin
FROM
    dbo.Item i
    LEFT OUTER JOIN dbo.ItemAttribute iam
        ON i.ItemID = iam.ItemID
        AND iam.Name = 'Manufacturer'
    LEFT OUTER JOIN dbo.ItemAttribute iac
        ON i.ItemID = iac.ItemID
        AND iac.Name = 'Color'
    LEFT OUTER JOIN dbo.ItemAttribute ias
        ON i.ItemID = ias.ItemID
        AND ias.Name = 'Size'
    LEFT OUTER JOIN dbo.ItemAttribute iao
        ON i.ItemID = iao.ItemID
        AND iao.Name = 'Country Of Origin'
    INNER JOIN dbo.ItemAttribute iapt
        ON i.ItemID = iapt.ItemID
        AND iapt.Name = 'Product Type'
    INNER JOIN dbo.ItemAttribute iamc
        ON i.ItemID = iamc.ItemID
        AND iamc.Name = 'Unit Cost'
WHERE
    iapt.Value = 'Clothing'
    AND CAST(iamc.Value AS DECIMAL(8,2)) >= 14;
						

Entity-Attribute-Value

Why this is an anti-pattern:

  • Six joins to ItemAttribute for one simple query
  • No ability to constrain Value (see Decimal cast)
  • Indexes on Value are impossible: VARCHAR(MAX) column
  • Mixing metadata with data; potential fixes worse (e.g., IntValue, StringValue, DecimalValue)
  • Referential integrity problem: Color, Colour, and Colro

Entity-Attribute-Value

How to fix the anti-pattern:

Do the necessary data modeling and bite that bullet.


SELECT
    i.ID as ItemID,
    i.Name as ItemName,
    i.ManufacturerName,
    i.Color,
    i.ProductSize,
    i.CountryOfOrigin
FROM
    dbo.Item i
    INNER JOIN dbo.ProductType pt 
        ON i.ProductTypeID = pt.ProductTypeID
WHERE
    pt.Name = 'Clothing'
    AND i.UnitCost >= 14;
						

"Normal" tables can have indexes (ProductTypeID, UnitCost), appropriate data types (UnitCost is DECIMAL), appropriate constraints (UnitCost > 0), etc.

Entity-Attribute-Value

When it might not be an anti-pattern:

  • Mockups
  • Systems with few records
  • Multi-client storage systems

Nested Views

Description:

Don't Repeat Yourself (DRY) in SQL Server: Views and Functions for repetitive actions.

Then, combine with the concept of encapsulation: views and functions as interoperating modules.

Treating modules as building blocks leads to views inside views inside views inside views...it's turtles all the way down.

Nested Views

Code-based example:


CREATE VIEW vSalesOrderDetail_1 AS
SELECT
    soh.SalesOrderID,
    soh.OrderDate,
    soh.DueDate,
    soh.Status,
    st.Name as SalesTerritory,
    cc.CardType,
    cc.CardNumber,
    soh.SubTotal,
    soh.TotalDue,
    sod.SalesOrderDetailID,
    sod.OrderQty,
    sod.UnitPrice,
    sod.LineTotal
FROM
    Sales.SalesOrderHeader soh
    INNER JOIN Sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
    INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
    INNER JOIN Sales.CreditCard cc ON soh.CreditCardID = cc.CreditCardID;
	
CREATE VIEW vEmail AS
SELECT
    v.SalesOrderID,
    v.SalesOrderDetailID,
    v.LineTotal,
    be.BusinessEntityID,
    p.FirstName,
    p.LastName,
    e.EmailAddress
FROM
    vSalesOrderDetail_1 v
    LEFT OUTER JOIN Sales.SalesOrderHeader soh ON v.SalesOrderID = soh.SalesOrderID
    LEFT OUTER JOIN Sales.SalesPerson sp ON soh.SalesPersonID = sp.BusinessEntityID
    LEFT OUTER JOIN Person.BusinessEntity be ON sp.BusinessEntityID = be.BusinessEntityID
    LEFT OUTER JOIN Person.Person p ON p.BusinessEntityID = be.BusinessEntityID
    LEFT OUTER JOIN Person.EmailAddress e ON e.BusinessEntityID = p.BusinessEntityID;
	
SELECT
    v.SalesOrderID,
    soh.AccountNumber,
    v.SalesOrderDetailID,
    v.LineTotal,
    sp.CommissionPct,
    v.BusinessEntityID
FROM
    vEmail v
    INNER JOIN Sales.SalesPerson sp ON v.BusinessEntityID = sp.BusinessEntityID
    INNER JOIN Sales.SalesOrderHeader soh ON soh.SalesOrderID = v.SalesOrderID
WHERE
    v.SalesOrderID IN (43659, 43660, 43661);
						

Nested Views

Demo:

Nested Views

Why this is an anti-pattern:

Modularization is a wonderful ideal, but the SQL Server query optimizer can get tricked with large plans.

Modularization hides a lot of complexity to the developer, but not to the engine.

With enough layers, the optimizer "gives up" and translates plans without finding optimal forms.

Nested Views

How to fix the anti-pattern:

Views join to tables, not other views.

It's okay to repeat yourself when you need to.

Nested Views

When it might not be an anti-pattern:

One layer above extremely simple views (e.g., SELECT [columns] FROM [Schema].[Table]--no joins)

When all views undergo thorough performance testing

User-Defined Functions

Description:

Modularization, part 2: using functions to encapsulate business logic in one location

User-Defined Functions

Code-based example:


SELECT
    p.*
FROM
    Production.Product p
WHERE
    dbo.ufnGetProductDealerPrice(p.ProductID, '2006-07-21') < 25;
					

User-Defined Functions

Why this is an anti-pattern:

  • Scalar functions operate once for each row in a result set
  • Scalar functions in WHERE clause run for all rows--not SARGable
  • Table-Valued Functions much more reasonable, but still have overhead cost

User-Defined Functions

How to fix the anti-pattern:

  • Re-write code not to use a function if one is unnecessary
  • Use the Common Language Runtime (CLR) when feasable
  • Use Inline Table-Valued functions when a TVF is necessary

User-Defined Functions

When it might not be an anti-pattern:

  • When using an Inline Table-Valued Function
  • When Scalar UDFs are fast calculations and only used in the SELECT clause

RBAR

Description:

Row-By-Agonizing-Row Queries

Cursors, loops, and "hidden RBAR"

Comes from being stuck in a procedural mindset

RBAR

Code-based example:


DECLARE @UserID INT;
DECLARE @LoginDate DATETIME;
DECLARE @LastLoginDate DATETIME;

DECLARE UserCursor CURSOR FOR
SELECT UserID
FROM dbo.[User];

OPEN UserCursor;
FETCH NEXT FROM UserCursor INTO @UserID;

WHILE @@FETCH_STATUS = 0
BEGIN
    DECLARE LoginCursor CURSOR FOR
    SELECT LoginDate
    FROM dbo.UserLogin
    WHERE UserID = @UserID;
    
    OPEN LoginCursor;
    FETCH NEXT FROM LoginCursor INTO @LoginDate;
    WHILE @@FETCH_STATUS = 0
    BEGIN
        IF (ISNULL(@LastLoginDate, '1900-01-01') < @LoginDate)
        BEGIN
            SET @LastLoginDate = @LoginDate;
        END
    END
    
    CLOSE LoginCursor;
    DEALLOCATE LoginCursor;
    
    IF (@LastLoginDate < DATEADD(MONTH, -3, CURRENT_TIMESTAMP))
    BEGIN
        UPDATE dbo.[User]
        SET IsActive = 0
        WHERE UserID = @UserID;
    END
    
    SET @LastLoginDate = NULL;
END

CLOSE UserCursor;
DEALLOCATE UserCursor;
						

RBAR

Why this is an anti-pattern:

SQL is a set-based language. T-SQL has procedural constructs but is a poor choice for procedural programmers.

We access tables an unnecessary number of times, causing performance slowdowns. In an extreme case, we can the User table N times (once for each user) and the UserLogin table M*N times (once for each login date for each user)

RBAR

How to fix the anti-pattern:

Use set-based constructs.


UPDATE dbo.User
SET IsActive = 0
WHERE
    NOT EXISTS
    (
        SELECT *
        FROM dbo.UserLogin ul
        WHERE
            ul.UserID = UserID
            AND ul.LoginDate >= DATEADD(MONTH, -3, CURRENT_TIMESTAMP)
    );
						

With a good index, we scan User once and seek against UserLogin.

Almost every RBAR query can be re-written using set-based notation.

RBAR

When it might not be an anti-pattern:

  • Certain maintenance routines (e.g., using sp_msforeachdb/sp_ForEachDB or sp_msforeachtable)
  • "Batch" operations which write to the transaction log (especially deletes)--WHILE loop to run a set-based operation on a subset of records at a time
  • When you want poorly-performing queries

Even if it's not an anti-pattern, think about moving this code to a procedural language well-designed to handle it (Powershell, C#, Python, etc.)

QBAQ

Description:

Query-By-Agonizing-Query

RBAR moved up one step.

QBAQ

Code-based example:


UPDATE dbo.SomeTable
SET ColumnA = @Value
WHERE KeyColumn = @ID;
 
UPDATE dbo.SomeTable
SET ColumnB = @SomeOtherValue
WHERE KeyColumn = @ID;
 
UPDATE dbo.SomeTable
SET ColumnC = ColumnA + ColumnB
WHERE KeyColumn = @ID;
						

QBAQ

Code-based example with RBAR:


WHILE EXISTS(SELECT * FROM #RelevantValues)
BEGIN
    SELECT TOP 1 @ID = ID
    FROM #RelevantValues;
 
    UPDATE dbo.SomeTable
    SET ColumnA = @Value
    WHERE KeyColumn = @ID;
 
    UPDATE dbo.SomeTable
    SET ColumnB = @SomeOtherValue
    WHERE KeyColumn = @ID;
 
    UPDATE dbo.SomeTable
    SET ColumnC = ColumnA + ColumnB
    WHERE KeyColumn = @ID;
 
    DELETE FROM #RelevantValues
    WHERE ID = @ID;
END
						

QBAQ

Code-based example with UNION abuse:


SELECT A, B, C FROM dbo.SomeTable WHERE SomeValue = 1
UNION ALL
SELECT A, B, C FROM dbo.SomeTable WHERE SomeValue = 2
UNION ALL
SELECT A, B, C FROM dbo.SomeTable WHERE SomeValue = 3
UNION ALL
SELECT A, B, C FROM dbo.SomeTable WHERE SomeOtherValue = 'A'
						

QBAQ

Why this is an anti-pattern:

  • Unnecessary table access
  • Overly lengthy code is harder to understand
  • Easy to make mistakes (forgot to modify one of the 49 UPDATE statements)

QBAQ

How to fix the anti-pattern:

Like RBAR, fixing QBAQ is all about thinking in sets and understanding the full context of your actions.

Stop thinking like a procedural developer: "Do X, then do Y, then do Z." Figure out when X, Y, and Z are interrelated and handle them simultaneously.


UPDATE st
SET
    ColumnA = @Value,
    ColumnB = @SomeOtherValue,
    ColumnC = @Value + @SomeOtherValue
FROM dbo.SomeTable st
    INNER JOIN #RelevantValues rv ON st.KeyColumn = rv.ID;
						

QBAQ

How to fix UNION abuse:

Like RBAR, QBAQ is all about thinking in sets and understanding the full context of your actions.


SELECT
    A, 
    B, 
    C 
FROM dbo.SomeTable 
WHERE
    SomeValue IN (1, 2, 3) 
    OR SomeOtherValue = 'A';
						

QBAQ

When it might not be an anti-pattern:

  • When UNIONing data sets allows for better index coverage
  • When breaking apart a complex query improves performance
  • When your current query is non-SARGable

Non-SARGable ORs

Description:

SARGable: the SQL Server engine can optimize your query using indexes

Common reasons for non-SARGable queries:

  • WHERE clause fields inside functions
  • Operators which change WHERE clause values
  • LIKE '%Something%'

Non-SARGable ORs

Code-based example:


SELECT
    ACCT.[Id].
    ACCT.[TypeId].
    ACCT.[EntityId].
    ACCT.[Name].
    ACCT.[Purpose],
    TF.Amount As TransferAmount
FROM
    dbo.Account ACCT
    INNER JOIN dbo.Transfer TF
        ON ACCT.Id = TF.AccountToId
        OR ACCT.Id = TF.AccountFromId;
						

Join from Account to Transfer happens on two keys: AccountToID and AccountFromID

Non-SARGable ORs

Why this is an anti-pattern:

Indexes won't help us here:

  • Index on AccountToID → scan to get rows with matching AccountFromID
  • Index on AccountFromID → scan to get rows with matching AccountToID
  • Index on (AccountToID, AccountFromID) → AccountFromID portion useless for this query
  • Separate indexes on both tables: SQL Server can only use (at most) one...and probably will use neither

Non-SARGable ORs

How to fix the anti-pattern:

Put indexes on AccountToID and AccountFromID, and then:


SELECT
    ACCT.[Id].
    ACCT.[TypeId].
    ACCT.[EntityId].
    ACCT.[Name].
    ACCT.[Purpose],
    TF.Amount As TransferAmount
FROM
    dbo.Account ACCT
    INNER JOIN dbo.Transfer TF
        ON ACCT.Id = TF.AccountFromId
 
UNION ALL
 
SELECT
    ACCT.[Id].
    ACCT.[TypeId].
    ACCT.[EntityId].
    ACCT.[Name].
    ACCT.[Purpose],
    TF.Amount As TransferAmount
FROM
    dbo.Account ACCT
    INNER JOIN dbo.Transfer TF
        ON ACCT.Id = TF.AccountToId;
						

Non-SARGable ORs

ORs in joins are OK when:

  1. TableX x INNER JOIN TableY y ON (x.ColumnA = y.ColumnA OR x.ColumnB = y.ColumnB) — Two separate columns with two separate joins are OK.
  2. TableX x INNER JOIN TableY y ON (x.ColumnA = y.ColumnA OR x.ColumnA = 6) — Join on a match or when one table matches a constant.
  3. TableX x INNER JOIN TableY y ON (x.ColumnA = y.ColumnA OR z.ColumnA = y.ColumnA) — Join on another table match, as long as z was brought in on a join earlier in the query. Watch out for triangular joins or accidental Cartesian products here.

Non-SARGable ORs

When it might not be an anti-pattern:

  • When performance is not a consideration for your application
  • Very small row counts

Linked Server Joins

Description:

Using linked servers to join large/many local tables with large/many foreign tables

Linked Server Joins

Code-based example:


SELECT
    A1.ColumnA,
	A2.ColumnB
FROM
    dbo.TableA A1
	INNER JOIN [RemoteServer].[RemoteDatabase].dbo.RemoteTable A2
	    ON A1.JoinColumn = A2.JoinColumn
WHERE
    A1.ColX < 50;
						

Even with a great filter, this query is likely to be terrible if TableA and RemoteTable are both large.

Linked Server Joins

Why this is an anti-pattern:

Prior to SQL Server 2012 SP1, tables statistics not available for foreign servers unless linked server account is sysadmin, db_owner, or db_ddladmin

Even with stats, cross-server joins mean passing rowsets from one server to the other before doing any joins

Linked Server Joins

How to fix the anti-pattern:

  • Replace linked servers with Always-On Availability Groups or replication to push remote data onto local servers
  • For static data (e.g., codes which change once a year), keep non-replicated copies on both servers
  • If linked servers are necessary, avoid joining local tables to remote tables; use temporary tables to pull a filtered remote data set back

Linked Server Joins

When it might not be an anti-pattern:

  • Small data sets
  • All remote joins (though maybe use OPENQUERY/OPENROWSET to make remote server do all the work)

Clustered GUIDs

Description:

Using a UNIQUEIDENTIFIER type as a clustering key

Developers love GUID keys: easy to generate with Guid.NewGuid(), guaranteed* unique, and great for tagging objects which turn into multiple tables

Clustered GUIDs

Code-based example:


CREATE TABLE dbo.tblGUID
(
    Cluster uniqueidentifier NOT NULL,
    Filler char(250)
);

ALTER TABLE dbo.tblGUID ADD CONSTRAINT
    [PK_tblGUID] PRIMARY KEY(Cluster);
					

Clustered GUIDs

Why this is an anti-pattern:

Great clustering keys are NUSE: Narrow, Unique, Static, Ever-Increasing

GUIDS are: US -- neither Narrow nor Ever-Increasing

Clustering on GUIDs leads to page splits and table fragmentation → poor performance

Windows 7 RC download process failure

Clustered GUIDs

How to fix the anti-pattern:

  • Use identity integer columns; for multi-table insertion scenario, pass Table-Valued Parameters to the Insert stored procedure and save your object in one procedure
  • Use a sequence (2012 or later)
  • Use a non-clustered GUID as your primary key and an identity integer for your clustered index

Clustered GUIDs

When it might not be an anti-pattern:

  • Small data sets
  • Static data

Wrapping Up

  1. Style
    • ANSI-89 Joins
    • SELECT *
  2. Theory
    • Zombie Fields
    • Unconstrained Data
    • Lack of Normalization
    • EAV
  3. Performance
    • Nested views & UDFs
    • RBAR & QBAQ
    • Non-SARGable ORs in joins
    • Linked server joins
    • Clustered GUIDs

References

 

Slides and code are available at http://www.catallaxyservices.com/presentations/TSQL-Antipatterns

E-Mail: feasel@catallaxyservices.com

Twitter: @feaselkl