|We Speak Linux|
My working definition is "rotten" code: code which has not been maintained very well. This code often lacks tests, but the key characteristic is the amount of pain necessary to make changes.
Legacy SQL will de-motivate you over time. You will spend too much time debugging terrible procedures. In order to fulfill even simple business requests, you will change lots of difficult code, including code you don't even know is obsolete. Even after you leave, your successor will curse your name.
Older code doesn't need to be legacy code. Older code does need TLC, however.
You need source control for your SQL code. Source control gives you the ability safely to experiment. If your experiment fails, roll back your changes.
Object-oriented languages have fully-featured test frameworks like NUnit to create unit tests. Developers can break apart small sections of functionality and test those sections controlling for that code's depdendencies.
T-SQL has a very limited version of database unit and integration testing. The most popular tools for these kinds of tests are tSQLt and Visual Studio database test projects.
Formatting SQL is the lowest-risk form of refactoring, as you aren't making any logical code changes. If you format your SQL well, however, you can make your code more readable. Readable code is (potentially) understandable code.
Code is for humans as much as it is for compilers and interpreters!
LEFT OUTER JOIN) to offset keywords from attributes
Simple is better than complex.
Complex is better than complicated.
A few examples of complicated code:
Tests help us simplify code because we can modify code inside a procedure while knowing that we maintain the same contract: given a particular set of inputs, we generate a particular set of outputs.
We will now look at a series of methods to simplify code.
We have a fact table and we need to calculate measures based on the values in the fact table.
The results are the same and the execution plans are almost the same, but the version with the APPLY operator is easier to read and easier to maintain.
With good database tests in place, this kind of change is extremely low-risk but potentially high-reward.
Dynamic SQL is sometimes necessary, but using it when you don't need to complicates things needlessly.
The problem with dynamic SQL is that it's harder to read (red text), does not provide syntax checking or Intellisense, and is harder to format: you either format the dynamic SQL or the procedure's static SQL, but you can't get both.
Cursors and WHILE loops are examples of RBAR (Row By Agonizing Row). Not only do they have a negative performance impact, they also have the ability to make code needlessly complicated.
Ways in which RBAR complicates our code:
Cursors and WHILE loops may sometimes be necessary, but ditching them when they're unnecessary will simplify your code base and improve performance to boot.
Look for "filters" (in the WHERE or SELECT clauses) which are needlessly complicated.
In this scenario, our query became a little less readable for a complete neophyte, but much shorter and easier to read overall. Even a relatively new T-SQL developer can run this code and quickly learn a new trick.
Look for walls of text, duplicated statements, and logically equivalent values in your code. Another example: unnecessary nested CASE statements.
Here is a grab bag of procedure design tips:
Once we have solid tests in place, we can look at refactoring entire designs instead of simply refactoring code within a single procedure.
This type of scenario will be most common if you have nested stored procedures. Then, treat the internal procedures as the equivalent of private methods and refactor around a contract.
Retain your outward-facing procedures' input and output signatures, and focus mostly on refactoring those "private" procedures.
Summary of Issues:
Summary of Changes:
Code is a liability: every line of code requires somebody to maintain it; code takes time to parse, compile, interpret, or run; and more mental overhead for developers.
If that line of code is not solving a problem, get rid of it!
Example: eliminating 20% of the code base with zero effect on users
We tend to hoard code, saving things "just in case." Instead of that, use source control and let a server hoard behind the scenes. Be aggressive in cleaning up code in comments or which is logically impossible.
Finding obsolete code:
Don't forget to take advantage of new functionality in SQL Server: window functions instead of cursors for running totals, data types like DATE, and functions like CONCAT to put together strings.
Look back at previous designs and figure out when choices were made based on no-longer-existent constraints: too restrictive (or slow) parts of hardware, older versions of SQL Server that didn't have new features, or even developers and architects who had less knowledge and experience than they now have.
Microsoft introduced the TRY-CATCH formulation in SQL Server 2005. This offers you a central location for error-handling logic.
Once you have captured an error, we need to log that error.
Errors aren't enough. When users complain about slowly-performing queries, we need metrics, and the finer-grained, the better.
Transaction handling in SQL Server is a little tricky; nested transactions don't work right, so you want to avoid those whenever possible. How do you avoid them when you don't know the call chain?
The net result from these changes is:
Legacy code is painful. Get tests in place and refactor that code.
To learn more, go here: http://CSmore.info/on/legacysql