Created by Kevin Feasel
![]() |
Catallaxy Services | ![]() @feaselkl |
![]() |
Curated SQL | |
![]() |
We Speak Linux |
Code smells and anti-patterns are related topics and will make up our discussion today.
A code smell is an intuitive feeling that a piece of code has fundamental problems. We develop a sense for code smells ("code nose") over time as we create and observe patterns of abuse.
Code smells and anti-patterns are related topics and will make up our discussion today.
An anti-pattern is a common technique we use to solve a recurring problem...incorrectly.
Anti-patterns often have a code smell. Sometimes, we can use unrelated code smells to dig out deeper anti-patterns.
“There are more anti-patterns in heaven and earth, Horatio, than are dreamt of in your philosophy.” -- Hamlet, DBA.
Some anti-patterns fit in multiple categories, and no single presentation can cover them all.
Even better, an anti-pattern in one scenario might actually be a reasonable business decision in another scenario.
In case that weren't enough, some anti-patterns may appear to be contradictory; resolution requires work.
This code smell is not necessarily a problem, but typically, when you see this syntax, the likelihood is great that the author will have all kinds of "I needed it in the '90s" hacks.
SELECT
a.Col1,
b.Col2
FROM
dbo.TableA a,
dbo.TableB b
WHERE
a.JoinColumn = b.JoinColumn;
This code smell has three problems:
SELECT
a.Col1,
b.Col2
FROM
dbo.TableA a,
dbo.TableB b,
dbo.TableC c
WHERE
a.JoinColumn = b.JoinColumn
AND a.ColX > 300
AND c.ColY < 50;
Use ANSI-92 syntax. Replace the prior query with:
SELECT
a.Col1,
b.Col2
FROM
dbo.TableA a
INNER JOIN dbo.TableB b ON a.JoinColumn = b.JoinColumn
INNER JOIN dbo.TableC c ON a.JoinColumn2 = c.JoinColumn2
WHERE
a.ColX > 300
AND c.ColY < 50;
Never. At least not if you're supporting SQL Server 6.0 or later. And if you're still supporting SQL Server prior to 6.0, you're a curator, not a DBA.
The simplest SELECT query can also be the most dangerous.
SELECT
*
FROM
dbo.TableA a
INNER JOIN dbo.TableB b ON a.JoinColumn = b.JoinColumn
INNER JOIN dbo.TableC c ON a.JoinColumn2 = c.JoinColumn2
WHERE
a.ColX > 300
AND c.ColY < 50;
Explicitly name the columns you want.
SELECT
a.Col1,
b.Col2
FROM
dbo.TableA a
INNER JOIN dbo.TableB b ON a.JoinColumn = b.JoinColumn
INNER JOIN dbo.TableC c ON a.JoinColumn2 = c.JoinColumn2
WHERE
a.ColX > 300
AND c.ColY < 50;
Too lazy and need most/all of the columns? Find a tool which does this for you. Or write an ad hoc query against sys.columns
.
Re-using old fields for some purpose other than their intention.
Typically, you see this with very old code or third-party platforms where you can't change the schema...or when the developer is just being lazy.
INSERT INTO dbo.Address
(
Line1,
Line2,
Line3,
City,
State,
ZIPCode
)
VALUES
(
@Line1,
@Line2,
@Region, --Put region in Line 3
@City,
@State,
@ZIPCode
);
ALTER TABLE dbo.Address ADD COLUMN Region VARCHAR(12) NULL;
UPDATE dbo.Address
SET Region = Line3,
Line3 = NULL;
Use attributes for their intended purposes.
Keep up on changes. Database changes are more expensive than code changes, but this isn't the 1990s anymore--there are tools which can help you.
When you have no control over the schema: third-party software you use but the company went out of business.
Tables with no:
CREATE TABLE A
(
Column1 VARCHAR(50) NOT NULL,
Column2 VARCHAR(50) NOT NULL,
Column3 VARCHAR(50) NOT NULL,
Column4 VARCHAR(50) NOT NULL
);
CREATE TABLE B
(
Column2 VARCHAR(50) NOT NULL,
ColumnA VARCHAR(50) NOT NULL,
ColumnB VARCHAR(50) NOT NULL
);
CREATE TABLE C
(
Column3 VARCHAR(50) NOT NULL,
ColumnB VARCHAR(50) NOT NULL,
ColumnZ VARCHAR(50) NOT NULL
);
Thinking about the code sample...
CREATE TABLE A
(
Column1 INT IDENTITY(1,1) NOT NULL,
Column2 TINYINT NOT NULL,
Column3 DECIMAL(9,2) NOT NULL,
Column4 CHAR(12) NOT NULL
);
ALTER TABLE A ADD CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED(Column1);
ALTER TABLE A ADD CONSTRAINT [UKC_A] UNIQUE(Column2, Column4);
ALTER TABLE A ADD CONSTRAINT [CK_Column3] CHECK (Column3 >= 0);
ALTER TABLE A ADD CONSTRAINT [CK_Column4]
CHECK (LEFT(Column4, 3) = 'RPT' AND RIGHT(Column4, 3) = 'USR');
CREATE TABLE B
(
Column2 TINYINT NOT NULL,
ColumnA SMALLINT NOT NULL,
ColumnB VARCHAR(30) NOT NULL
);
ALTER TABLE B ADD CONSTRAINT [PK_B]
PRIMARY KEY CLUSTERED(Column2);
CREATE TABLE C
(
Column3 VARCHAR(50) NOT NULL,
ColumnB CHAR(2) NOT NULL,
ColumnZ DATETIME2(0) NOT NULL
);
ALTER TABLE C ADD CONSTRAINT [PK_C]
PRIMARY KEY CLUSTERED(Column3, ColumnB);
ALTER TABLE C ADD CONSTRAINT [DF_ColumnZ]
DEFAULT(CURRENT_TIMESTAMP) FOR ColumnZ;
ALTER TABLE A ADD CONSTRAINT [FK_B_A]
FOREIGN KEY(Column2) REFERENCES [B](Column2);
Throwing together a quick demo, perhaps...but probably not even then.
“Normalize until it hurts; denormalize until it works.”
The bad advice which launched (many more than) a thousand crappy databases.
This is the DBA equivalent of "Bleedings cure diseases."
Grab a good book and refactor that database. Start with Pro SQL Server 2012 Relational Database Design and Implementation by Louis Davidson and Jessica Moss.
Appropriately-designed, selective denormalization to solve specific pressing issues can be okay. Document the deviation and explain why. Test first to make sure you really need this.
Leeching can be helpful sometimes; make sure you know when.
Looking for a "clockwork" system: create once and let end users use and (under the covers) maintain.
This is hard for databases: table changes are expensive and can be difficult to do right.
Stroke of genius: One True Lookup Table. If users want Color, they get Color; if they want Size, they get Size. No DBAs needed to slow down the process.
CREATE TABLE dbo.ItemAttribute
(
ItemAttributeID int identity(1,1) NOT NULL,
ItemID int NOT NULL,
Name varchar(100) NOT NULL,
Value varchar(max) NOT NULL
);
ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [PK_ItemAttribute]
PRIMARY KEY CLUSTERED(ID);
ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [UKC_ItemAttribute]
UNIQUE (ItemID, Name);
ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [FK_ItemAttribute_Item]
FOREIGN KEY (ItemID) REFERENCES dbo.Item(ItemID);
Inserting is easy; retrieval is a pain.
SELECT
i.ID as ItemID,
i.Name as ItemName,
COALESCE(iam.Value, 'UNKNOWN') as ManufacturerName,
COALESCE(iac.Value, 'UNKNOWN') as ItemColor,
COALESCE(ias.Value, 'UNKNOWN') as ProductSize,
COALESCE(iao.Value, 'UNKNOWN') as CountryOfOrigin
FROM
dbo.Item i
LEFT OUTER JOIN dbo.ItemAttribute iam
ON i.ItemID = iam.ItemID
AND iam.Name = 'Manufacturer'
LEFT OUTER JOIN dbo.ItemAttribute iac
ON i.ItemID = iac.ItemID
AND iac.Name = 'Color'
LEFT OUTER JOIN dbo.ItemAttribute ias
ON i.ItemID = ias.ItemID
AND ias.Name = 'Size'
LEFT OUTER JOIN dbo.ItemAttribute iao
ON i.ItemID = iao.ItemID
AND iao.Name = 'Country Of Origin'
INNER JOIN dbo.ItemAttribute iapt
ON i.ItemID = iapt.ItemID
AND iapt.Name = 'Product Type'
INNER JOIN dbo.ItemAttribute iamc
ON i.ItemID = iamc.ItemID
AND iamc.Name = 'Unit Cost'
WHERE
iapt.Value = 'Clothing'
AND CAST(iamc.Value AS DECIMAL(8,2)) >= 14;
Do the necessary data modeling and bite that bullet.
SELECT
i.ID as ItemID,
i.Name as ItemName,
i.ManufacturerName,
i.Color,
i.ProductSize,
i.CountryOfOrigin
FROM
dbo.Item i
INNER JOIN dbo.ProductType pt
ON i.ProductTypeID = pt.ProductTypeID
WHERE
pt.Name = 'Clothing'
AND i.UnitCost >= 14;
"Normal" tables can have indexes (ProductTypeID, UnitCost), appropriate data types (UnitCost is DECIMAL), appropriate constraints (UnitCost > 0), etc.
Don't Repeat Yourself (DRY) in SQL Server: Views and Functions for repetitive actions.
Then, combine with the concept of encapsulation: views and functions as interoperating modules.
Treating modules as building blocks leads to views inside views inside views inside views...it's turtles all the way down.
CREATE VIEW vSalesOrderDetail_1 AS
SELECT
soh.SalesOrderID,
soh.OrderDate,
soh.DueDate,
soh.Status,
st.Name as SalesTerritory,
cc.CardType,
cc.CardNumber,
soh.SubTotal,
soh.TotalDue,
sod.SalesOrderDetailID,
sod.OrderQty,
sod.UnitPrice,
sod.LineTotal
FROM
Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
INNER JOIN Sales.CreditCard cc ON soh.CreditCardID = cc.CreditCardID;
CREATE VIEW vEmail AS
SELECT
v.SalesOrderID,
v.SalesOrderDetailID,
v.LineTotal,
be.BusinessEntityID,
p.FirstName,
p.LastName,
e.EmailAddress
FROM
vSalesOrderDetail_1 v
LEFT OUTER JOIN Sales.SalesOrderHeader soh ON v.SalesOrderID = soh.SalesOrderID
LEFT OUTER JOIN Sales.SalesPerson sp ON soh.SalesPersonID = sp.BusinessEntityID
LEFT OUTER JOIN Person.BusinessEntity be ON sp.BusinessEntityID = be.BusinessEntityID
LEFT OUTER JOIN Person.Person p ON p.BusinessEntityID = be.BusinessEntityID
LEFT OUTER JOIN Person.EmailAddress e ON e.BusinessEntityID = p.BusinessEntityID;
SELECT
v.SalesOrderID,
soh.AccountNumber,
v.SalesOrderDetailID,
v.LineTotal,
sp.CommissionPct,
v.BusinessEntityID
FROM
vEmail v
INNER JOIN Sales.SalesPerson sp ON v.BusinessEntityID = sp.BusinessEntityID
INNER JOIN Sales.SalesOrderHeader soh ON soh.SalesOrderID = v.SalesOrderID
WHERE
v.SalesOrderID IN (43659, 43660, 43661);
Modularization is a wonderful ideal, but the SQL Server query optimizer can get tricked with large plans.
Modularization hides a lot of complexity to the developer, but not to the engine.
With enough layers, the optimizer "gives up" and translates plans without finding optimal forms.
Views join to tables, not other views.
It's okay to repeat yourself when you need to.
One layer above extremely simple views (e.g., SELECT [columns] FROM [Schema].[Table]--no joins)
When all views undergo thorough performance testing
Modularization, part 2: using functions to encapsulate business logic in one location
SELECT
p.*
FROM
Production.Product p
WHERE
dbo.ufnGetProductDealerPrice(p.ProductID, '2006-07-21') < 25;
Row-By-Agonizing-Row Queries
Cursors, loops, and "hidden RBAR"
Comes from being stuck in a procedural mindset
DECLARE @UserID INT;
DECLARE @LoginDate DATETIME;
DECLARE @LastLoginDate DATETIME;
DECLARE UserCursor CURSOR FOR
SELECT UserID
FROM dbo.[User];
OPEN UserCursor;
FETCH NEXT FROM UserCursor INTO @UserID;
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE LoginCursor CURSOR FOR
SELECT LoginDate
FROM dbo.UserLogin
WHERE UserID = @UserID;
OPEN LoginCursor;
FETCH NEXT FROM LoginCursor INTO @LoginDate;
WHILE @@FETCH_STATUS = 0
BEGIN
IF (ISNULL(@LastLoginDate, '1900-01-01') < @LoginDate)
BEGIN
SET @LastLoginDate = @LoginDate;
END
END
CLOSE LoginCursor;
DEALLOCATE LoginCursor;
IF (@LastLoginDate < DATEADD(MONTH, -3, CURRENT_TIMESTAMP))
BEGIN
UPDATE dbo.[User]
SET IsActive = 0
WHERE UserID = @UserID;
END
SET @LastLoginDate = NULL;
END
CLOSE UserCursor;
DEALLOCATE UserCursor;
SQL is a set-based language. T-SQL has procedural constructs but is a poor choice for procedural programmers.
We access tables an unnecessary number of times, causing performance slowdowns. In an extreme case, we can the User table N times (once for each user) and the UserLogin table M*N times (once for each login date for each user)
Use set-based constructs.
UPDATE dbo.User
SET IsActive = 0
WHERE
NOT EXISTS
(
SELECT *
FROM dbo.UserLogin ul
WHERE
ul.UserID = UserID
AND ul.LoginDate >= DATEADD(MONTH, -3, CURRENT_TIMESTAMP)
);
With a good index, we scan User once and seek against UserLogin.
Almost every RBAR query can be re-written using set-based notation.
Even if it's not an anti-pattern, think about moving this code to a procedural language well-designed to handle it (Powershell, C#, Python, etc.)
Query-By-Agonizing-Query
RBAR moved up one step.
UPDATE dbo.SomeTable
SET ColumnA = @Value
WHERE KeyColumn = @ID;
UPDATE dbo.SomeTable
SET ColumnB = @SomeOtherValue
WHERE KeyColumn = @ID;
UPDATE dbo.SomeTable
SET ColumnC = ColumnA + ColumnB
WHERE KeyColumn = @ID;
WHILE EXISTS(SELECT * FROM #RelevantValues)
BEGIN
SELECT TOP 1 @ID = ID
FROM #RelevantValues;
UPDATE dbo.SomeTable
SET ColumnA = @Value
WHERE KeyColumn = @ID;
UPDATE dbo.SomeTable
SET ColumnB = @SomeOtherValue
WHERE KeyColumn = @ID;
UPDATE dbo.SomeTable
SET ColumnC = ColumnA + ColumnB
WHERE KeyColumn = @ID;
DELETE FROM #RelevantValues
WHERE ID = @ID;
END
SELECT A, B, C FROM dbo.SomeTable WHERE SomeValue = 1
UNION ALL
SELECT A, B, C FROM dbo.SomeTable WHERE SomeValue = 2
UNION ALL
SELECT A, B, C FROM dbo.SomeTable WHERE SomeValue = 3
UNION ALL
SELECT A, B, C FROM dbo.SomeTable WHERE SomeOtherValue = 'A'
Like RBAR, fixing QBAQ is all about thinking in sets and understanding the full context of your actions.
Stop thinking like a procedural developer: "Do X, then do Y, then do Z." Figure out when X, Y, and Z are interrelated and handle them simultaneously.
UPDATE st
SET
ColumnA = @Value,
ColumnB = @SomeOtherValue,
ColumnC = @Value + @SomeOtherValue
FROM dbo.SomeTable st
INNER JOIN #RelevantValues rv ON st.KeyColumn = rv.ID;
Like RBAR, QBAQ is all about thinking in sets and understanding the full context of your actions.
SELECT
A,
B,
C
FROM dbo.SomeTable
WHERE
SomeValue IN (1, 2, 3)
OR SomeOtherValue = 'A';
SARGable: the SQL Server engine can optimize your query using indexes
Common reasons for non-SARGable queries:
SELECT
ACCT.[Id].
ACCT.[TypeId].
ACCT.[EntityId].
ACCT.[Name].
ACCT.[Purpose],
TF.Amount As TransferAmount
FROM
dbo.Account ACCT
INNER JOIN dbo.Transfer TF
ON ACCT.Id = TF.AccountToId
OR ACCT.Id = TF.AccountFromId;
Join from Account to Transfer happens on two keys: AccountToID and AccountFromID
Indexes won't help us here:
Put indexes on AccountToID and AccountFromID, and then:
SELECT
ACCT.[Id].
ACCT.[TypeId].
ACCT.[EntityId].
ACCT.[Name].
ACCT.[Purpose],
TF.Amount As TransferAmount
FROM
dbo.Account ACCT
INNER JOIN dbo.Transfer TF
ON ACCT.Id = TF.AccountFromId
UNION ALL
SELECT
ACCT.[Id].
ACCT.[TypeId].
ACCT.[EntityId].
ACCT.[Name].
ACCT.[Purpose],
TF.Amount As TransferAmount
FROM
dbo.Account ACCT
INNER JOIN dbo.Transfer TF
ON ACCT.Id = TF.AccountToId;
Using linked servers to join large/many local tables with large/many foreign tables
SELECT
A1.ColumnA,
A2.ColumnB
FROM
dbo.TableA A1
INNER JOIN [RemoteServer].[RemoteDatabase].dbo.RemoteTable A2
ON A1.JoinColumn = A2.JoinColumn
WHERE
A1.ColX < 50;
Even with a great filter, this query is likely to be terrible if TableA and RemoteTable are both large.
Prior to SQL Server 2012 SP1, tables statistics not available for foreign servers unless linked server account is sysadmin, db_owner, or db_ddladmin
Even with stats, cross-server joins mean passing rowsets from one server to the other before doing any joins
Using a UNIQUEIDENTIFIER type as a clustering key
Developers love GUID keys: easy to generate with Guid.NewGuid()
, guaranteed* unique, and great for tagging objects which turn into multiple tables
CREATE TABLE dbo.tblGUID
(
Cluster uniqueidentifier NOT NULL,
Filler char(250)
);
ALTER TABLE dbo.tblGUID ADD CONSTRAINT
[PK_tblGUID] PRIMARY KEY(Cluster);
Great clustering keys are NUSE: Narrow, Unique, Static, Ever-Increasing
GUIDS are: US -- neither Narrow nor Ever-Increasing
Clustering on GUIDs leads to page splits and table fragmentation → poor performance
Slides and code are available at http://www.catallaxyservices.com/presentations/TSQL-Antipatterns
E-Mail: feasel@catallaxyservices.com
Twitter: @feaselkl