Pulling Rank for the Boss

New functions introduced in SQL Server 2008 Release 2 let you assign rankings to rows in a result set. These functions are ROW_NUMBER, RANK, and DENSE_RANK. Look to them anytime you are faced with a business question involving words or phrases such as “topmost” or “bottommost”, “top N” or “bottom N”, or that are otherwise answerable by ranking the rows in a result set according to some criteria that you can apply to one or more columns of data.

Example Data: This is the fourth in a series of articles exploring window functions and their enhanced implementation in SQL Server 2012. If you would like to follow along, execute this script to create the example table and data. All articles in this series use the same data set.

Simple Numbering

You're a data analyst for a large department store chain that is struggling for relevancy and teetering on bankruptcy. Your boss is about to be grilled in an upcoming Board of Directors meeting. While doing his homework to prepare, he phones you with the following request:

“Give me three examples of poor quarterly performance for each department. I need them right away–in thirty minutes.”

You can hear the strain in your boss's voice. You would like to ask some follow-up questions, but you wisely decide that your boss is in no mood to discuss the subtleties behind his request. You accept his request as stated and promise to get cracking.

Gulping down the last of your morning coffee, you decide to put the ROW_NUMBER function to use. You'll number quarterly sales records in order from poorest to best performance. Then you can peel off three of the worst examples by department. You begin by taking a quick look at the raw data:

/*------------------------
select department, year, quarter, sales
from quarterly_sales
------------------------*/

department         year quarter      sales
---------------- ------ ------- ----------
Appliances         2000       1  235956.23
Appliances         2000       2  129871.95
Appliances         2000       3  398213.39
Appliances         2000       4   54111.05
Appliances         2001       1   87230.33
...

Now it's time to apply the ROW_NUMBER function. Add the following column expression to your query:

row_number() OVER
   (partition by department
    order by sales) rank

Here's an explanation of what this function is doing for you: 1. Rows for each department are treated separately. (the `partition by department` clause) 2. Each row for a department is given a number, beginning at 1. (the `row_number()` invocation) 3. Row number 1 will be the row with lowest sales for a given department. Row numbers will increase as sales increase. (the `order by sales` clause) Quickly you type in the new query and view the results:

/*------------------------
select department, year, quarter, sales,
   row_number() OVER
      (partition by department
       order by sales) rank
from quarterly_sales
------------------------*/

department         year quarter      sales  rank
---------------- ------ ------- ---------- -----
Appliances         2000       4   54111.05     1
Appliances         2001       1   87230.33     2
Appliances         2002       1   95678.13     3
Appliances         2005       3   96078.73     4
Appliances         2001       4  100010.13     5
...
Electronics        2000       1  345873.14     1
Electronics        2000       2  401144.56     2
Electronics        2000       3  454987.33     3
Electronics        2001       1  467043.14     4
Electronics        2000       4  476539.23     5
...

These results are looking good. The rankings are going upwards as the sales go up. Each department is ranked separately. All that's left is to peel off the three worst rows by department. A simple, enclosing query will do the trick:

/*------------------------
select * from
(
select department, year, quarter, sales,
   row_number() OVER
      (partition by department
       order by sales) rank
from quarterly_sales
) bottom_three
where rank <= 3;
------------------------*/

department         year quarter      sales  rank
---------------- ------ ------- ---------- -----
Appliances         2000       4   54111.05     1
Appliances         2001       1   87230.33     2
Appliances         2002       1   95678.13     3
Electronics        2000       1  345873.14     1
Electronics        2000       2  401144.56     2
Electronics        2000       3  454987.33     3
Jewelry            2000       1  240939.33     1
Jewelry            2000       3  283993.33     2
Jewelry            2008       1  287454.44     3
Mattresses         2000       4   32437.42     1
Mattresses         2000       1   34873.33     2
Mattresses         2001       1   36747.44     3
Vacuums            2000       2   56983.43     1
Vacuums            2001       1   64341.54     2
Vacuums            2000       1   65873.32     3

(15 row(s) affected)

Perfect! You rush these results to your boss. Not only have you nailed his request, you've gotten him the results in just 15 minutes. Your boss is so pleased he doesn't seem to mind at all when you take an extra five minutes to explain about the possibility of ties in the data. You fill him in on that corner case and what it can mean, ask him to phone you immediately if there's anything else he needs, and return to your own office feeling just a wee bit more secure about your job.

Respecting Ties

And then the phone rings. Yep! It's the boss. At least he's happy. You're explanation about ties got him to thinking. He doesn't want the board to catch him out on those, so he amends his earlier request and now asks you to:

“Rank the sales by department. Round all the sales numbers to the nearest $10,000. Get me the worst three rankings for each department, ties included. Do it in 20 minutes, and I'll buy you lunch.”

Lunch sounds good. Rounding is easy. You promise your boss the results.

The RANK function is your secret weapon. RANK respects ties by assigning them the same rank number. Pull out your previous query and substitute RANK in place of ROW_NUMBER. Invoke ROUND as the boss asks. Done! Here is the new query with its results:

/*------------------------
select * from
(
select department, year, quarter, 
   round(sales,-4) sales, 
   rank() OVER
      (partition by department
       order by round(sales,-4)) rank
from quarterly_sales
) bottom_three
where rank <= 3;
------------------------*/

department         year quarter      sales  rank
---------------- ------ ------- ---------- -----
Appliances         2000       4   50000.00     1
Appliances         2001       1   90000.00     2
Appliances         2001       4  100000.00     3
Appliances         2002       1  100000.00     3
Appliances         2005       3  100000.00     3
Electronics        2000       1  350000.00     1
Electronics        2000       2  400000.00     2
Electronics        2000       3  450000.00     3
Jewelry            2000       1  240000.00     1
Jewelry            2000       3  280000.00     2
Jewelry            2008       1  290000.00     3
Jewelry            2008       2  290000.00     3
Mattresses         2000       1   30000.00     1
Mattresses         2000       4   30000.00     1
Mattresses         2001       1   40000.00     3
Mattresses         2001       2   40000.00     3
Mattresses         2001       3   40000.00     3
Mattresses         2001       4   40000.00     3
Mattresses         2002       4   40000.00     3
Mattresses         2000       3   40000.00     3
Vacuums            2000       2   60000.00     1
Vacuums            2001       1   60000.00     1
Vacuums            2000       3   70000.00     3
Vacuums            2000       1   70000.00     3

(24 row(s) affected)

That took all of five minutes. Your boss gave you 20. Not wanting to make your work look too easy, you sit back and catch up on Facebook for 10 minutes before taking the elevator to your boss's floor to deliver the results.

Going Dense

As before, you did need to do some explaining about the results. Your boss honed in pretty quickly on the following data from the Mattresses Department:

Mattresses         2000       1   30000.00     1
Mattresses         2000       4   30000.00     1
Mattresses         2001       1   40000.00     3

Why the jump from rank number 1 to rank number 3? What happened to rank number 2? You explain how the two rows ranked as 1 are properly indicated as tying for first place. The third row then, is properly indicated as being third. It is third, because it is preceded by two other rows. Your boss seems happy with that explanation. He remains happy long enough for you to return to your office.

Then the phone rings. Your boss has a request:

“Would you explain again about the missing number 2?”

Grimacing, you realize the gaps in ranking numbers will be a stumbling-block for your boss when he presents the data to the board. Instead of explaining again about the missing rank number, you offer to redo the results. This time, you'll take care to avoid gaps in the numbers. The boss likes that idea—A LOT.

DENSE_RANK is your friend here. You can use DENSE_RANK in the same manner as RANK, but the difference is that DENSE_RANK eliminates gaps in the numbering. That pesky jump from 1 to 3 will be gone, and that's one less detail for your boss to have to explain to a bunch of bored [sic] members who could care less.

You decide to modify the query as follows:

select department, year, quarter, sales, drank from
(
select department, year, quarter, 
   round(sales,-4) sales, 
   rank() OVER
      (partition by department
       order by round(sales,-4)) rank,
   dense_rank() OVER
      (partition by department
       order by round(sales, -4)) drank
from quarterly_sales
) bottom_three
where rank <= 3;

Here's an explanation of the changes:

You retain the call to RANK in the subquery. You do that, because your boss wants the same results as before. He wants the bottom three performing quarters by department, including ties.
For the same reason, you retain the rank <= 3 predicate in the WHERE clause.
You add a new column expression invoking DENSE_RANK. This gives you rankings for the board members that have no gaps.
You rewrite the SELECT list so the query reports the results from DENSE_RANK. Your boss gets the data he wants because of the WHERE clause. But the gaps are gone, giving him one less thing to explain to the board.

Following are the results from this new iteration of the query:

/*------------------------
select department, year, quarter, sales, drank from
(
select department, year, quarter, 
   round(sales,-4) sales, 
   rank() OVER
      (partition by department
       order by round(sales,-4)) rank,
   dense_rank() OVER
      (partition by department
       order by round(sales, -4)) drank
from quarterly_sales
) bottom_three
where rank <= 3;
------------------------*/

department         year quarter      sales  drank
---------------- ------ ------- ---------- ------
Appliances         2000       4   50000.00      1
Appliances         2001       1   90000.00      2
Appliances         2001       4  100000.00      3
Appliances         2002       1  100000.00      3
Appliances         2005       3  100000.00      3
Electronics        2000       1  350000.00      1
Electronics        2000       2  400000.00      2
Electronics        2000       3  450000.00      3
Jewelry            2000       1  240000.00      1
Jewelry            2000       3  280000.00      2
Jewelry            2008       1  290000.00      3
Jewelry            2008       2  290000.00      3
Mattresses         2000       1   30000.00      1
Mattresses         2000       4   30000.00      1
Mattresses         2001       1   40000.00      2
Mattresses         2001       2   40000.00      2
Mattresses         2001       3   40000.00      2
Mattresses         2001       4   40000.00      2
Mattresses         2002       4   40000.00      2
Mattresses         2000       3   40000.00      2
Vacuums            2000       2   60000.00      1
Vacuums            2001       1   60000.00      1
Vacuums            2000       3   70000.00      2
Vacuums            2000       1   70000.00      2

(24 row(s) affected)

You have the same 24 department/quarter combinations as before. The selection criteria is unchanged. What is changed is that the gaps in the rankings are gone. Those three rows for the Mattresses Department that bothered your boss last time now look as follows:

Mattresses         2000       1   30000.00      1
Mattresses         2000       4   30000.00      1
Mattresses         2001       1   40000.00      2

The ranking goes smoothly from 1 to 2. No gaps. Nothing to provoke questions from pesky board members during your boss's presentation.

You rush the results upstairs. The boss is ecstatic. He's got all he needs now for the Board of Directors meeting. He decides to call it a morning. You leave for an early lunch with the boss.

Note! The query results in this article just happen to have come out grouped by department and sorted by rank. That apparent grouping and sorting is a side-effect of how I created the table combined with the execution of the window functions. It is a coincidence, nothing more. Do not be lulled into thinking that invoking a window function is a way to control the order in which rows are returned. If ordering is important, you must include an ORDER BY clause.

Published on Jan 1, 2012