Sliced Bread in SQL

My second in a series of posts in response to Tim Ford's #EntryLevel Challenge.

A comparison to sliced bread is a rhetorical device evoking the wonder of some new invention or feature in comparison to what went before. Sliced bread revolutionized the making of lunches and toast for breakfast. It meant so much to harried families that a World War II attempt to ban the slicing of bread to save on resources was opposed by New York City Mayor LaGuardia and lasted a mere 49 days from January 18, 1943 until being rescinded on March 8 that same year. 

There is sliced bread in SQL.

You'll find it in the form of window functions, which are sometimes in the world of Oracle Database referred to as analytic functions. If you can write a join and compute sums and averages, then drop everything and make window functions the very next thing you learn in SQL. Yes, they are that important. 

Here's an example business question involving the Adventure Works database:

For which products does our most recent change in cost represent a decrease in cost?

Following is a painful query to answer this question. I wrote it without benefit of window functions, and the result is a doubly-nested set of correlated subqueries. 

SELECT pch.ProductID, pch.StartDate, pch.StandardCost
FROM Production.ProductCostHistory AS pch
  AND pch.StandardCost < (
    SELECT StandardCost 
    FROM Production.ProductCostHistory AS pch_prior
    WHERE pch.ProductID = pch_prior.ProductID
      AND pch_prior.EndDate = (
        SELECT MAX(EndDate)
        FROM Production.ProductListPriceHistory AS pch_prior_max
        WHERE pch_prior.ProductID = pch_prior_max.ProductID

I used to enjoy writing this type of query in the way that one might enjoy putting together a jigsaw puzzle. And so much the better if a few self-joins were involved! But today I'd rather just grab some sliced bread out of the bag and be done with it. Here's the rewrite using a window function:

SELECT pch.ProductID, pch.StartDate, pch.StandardCost 
    SELECT ProductId, StartDate, EndDate, StandardCost, 
       ISNULL(EndDate, GETDATE()) AS NoNullDate, 
       LAG(StandardCost, 1) OVER (
            PARTITION BY ProductID 
            ORDER BY ISNULL(EndDate, GETDATE())
       ) AS PastCost
    FROM Production.ProductCostHistory
) AS pch
  AND pch.StandardCost < pch.PastCost

Execute both queries against Adventure Works if you are able. Here are some things to try while you are doing that:

  • Execute the subqueries independently, like you might do when debugging a query. This is not so easily done in the first query, but from the window function query it's a simple copy/paste operation to execute the inner query by itself.
  • Execute the window function subquery and notice how it returns all the data needed to answer the business question at hand. Then the outer query filters that data for the answer. 
  • Add the previous cost to the final output. This is dead-simple to do in the window function solution, and not so easy at all in the double-nested-correlated-subquery solution. 

It's simple to show the previous cost in the output from the window function solution. Just add one column to the SELECT list and job done:

SELECT pch.ProductID, pch.StartDate, pch.StandardCost, pch.PastCost

I don't want to bother with how to get the same output from the other query. Window functions make the job so much easier that it's not cost-effective to spend time on alternative solutions. 

Recognizing when to use window functions is useful. Train yourself to look for the following indicators of when window functions can save you time and help create easier to understand and better performing queries:

Learning the execution model is helpful too. Following are a some resources to help you grasp the conceptual details of how window functions work and how to think about using them:

No one would run a restaurant or any other food operation today without readily available sliced bread. Likewise, no one should be running database queries without an understanding of window functions and when to use them. Learn them and use them. They are the best thing in SQL since sliced bread.