Jonathan Gennick

Jonathan Gennick

press my name for the menu

Writer * Book Editor * Oracle DBA * SQL & PL/SQL Developer

Father * Husband * Son * Mountain Biker * EMT

Understanding the WITH Clause

Recently I researched recursive SQL queries for an Oracle Technology Network (OTN) article on Oracle's CONNECT BY syntax. (By the way, that article is scheduled to appear on OTN sometime in July) While researching that article, someone at Oracle pointed me to the SELECT statement's WITH clause. WITH was introduced in the SQL:1999 standard, and made it's way into Oracle in Oracle9i Database Release 1. As defined in the standard, WITH enables you to do two things: I was only dimly aware of WITH, and hadn't given it much thought. That it could be used to write recursive queries was a surprise to me, probably because Oracle hasn't implemented that aspect of WITH. I decided to do some investigating to find out just what WITH was all about.

Using WITH to Issue Recursive Queries

Perhaps the driving force behind the introduction of WITH in the SQL standard was the need to issue recursive queries. Oracle has long (since version 2) supported such queries through its CONNECT BY syntax. For example, the following CONNECT BY query retrieves the definition of an automobile from a bill-of-mterials table:
SQL> SELECT assembly_id, assembly_name, parent_assembly
  2  FROM bill_of_materials
  3  START WITH assembly_id=100
  4  CONNECT BY parent_assembly = PRIOR assembly_id;

ASSEMBLY_ID ASSEMBLY_NAME           PARENT_ASSEMBLY
----------- ----------------------- ---------------
        100 Automobile
        110 Combustion Engine                   100
        111 Piston                              110
        112 Air Filter                          110
        113 Spark Plug                          110
        114 Block                               110
        115 Starter System                      110
        116 Alternator                          115
        117 Battery                             115
        118 Starter Motor                       115
        120 Body                                100
        121 Roof                                120
        122 Left Door                           120
        123 Right Door                          120
        130 Interior                            100

If you look carefully at this output, you can see that an automobile is made up of a combustion engine, which in turn is made up of a piston, air filter, spark plug, block, and starter system. The starter system is in turn composed of an alternator, battery, and starter motor, and so forth. CONNECT BY gives you the ability to query a table that is recursively related to itself, and the results from a CONNECT BY query are generated in the hierarchical order shown here. When you write a CONNECT BY query, you can think of the START WITH clause as defining the set of root nodes to be returned by the query. The CONNECT BY clause then defines the "join" between parent and child rows. I explain this in more detail in my OTN article. Partly for grins, and partly to expand my horizons a bit, I decided to look at how the preceding query could be implemented using WITH instead of CONNECT BY. To do that, I had to use a database supporting recursive WITH, so I installed a copy of IBM DB2. After a good bit of head-scratching and reading of IBM's example queries, I came up with the following WITH-based solution to my bill-of-materials query:
WITH recursiveBOM
   (assembly_id, assembly_name, parent_assembly) AS
(SELECT parent.assembly_id,
        parent.assembly_name,
        parent.parent_assembly
FROM bill_of_materials parent
WHERE parent.assembly_id=100
UNION ALL
SELECT child.assembly_id,
       child.assembly_name,
       child.parent_assembly
FROM recursiveBOM parent, bill_of_materials child
WHERE child.parent_assembly = parent.assembly_id)
SELECT assembly_id, parent_assembly, assembly_name
FROM recursiveBOM;
Wow! This is inscrutable. It took me a long time fathom this query. Let's take it a piece at a time:
WITH recursiveBOM
   (assembly_id, assembly_name, parent_assembly) AS
The WITH keyword defines the name recursiveBOM for the subquery that is to follow. Next comes a list of column aliases to use when referencing the results of the subquery. Oracle requires that you specify aliases in the subquery's SELECT, something I'll talk more about later. Next comes the first part of the named subquery:
(SELECT parent.assembly_id,
        parent.assembly_name,
        parent.parent_assembly
FROM bill_of_materials parent
WHERE parent.assembly_id=100
The named subquery is a UNION ALL of two queries. This, the first query, defines the starting point for the recursion. As in my CONNECT BY query, I want to know what makes up assembly 100, an automobile. Next up is the part that was most confusing to me:
UNION ALL
SELECT child.assembly_id,
       child.assembly_name,
       child.parent_assembly
FROM recursiveBOM parent, bill_of_materials child
WHERE child.parent_assembly = parent.assembly_id)
This is where things get recursive, and confusing. The second query in the union joins bill_of_materials to the results of the named subquery. Notice how the WHERE clause strikingly resembles the CONNECT BY clause used in the Oracle version of this recursive query. The final part of the query is the main SELECT:
SELECT assembly_id, parent_assembly, assembly_name
FROM recursiveBOM;
This SELECT does nothing more than to retrieve the results returned by named subquery. All the recursion happens in the subquery. The important thing to notice here is that the three column names in the main query's select-list match the three aliases specified near the beginning of the WITH clause. The aliases you specify for a named subquery are the names you must use when retrieving the results of that subquery. How does WITH's recursion work? It's important to form a clear mental-model of how a query such as this produces the results that it does.
For more on mental models, see my article at:
http://gennick.com/mental_models.html
My first attempt at understanding the preceding WITH query was to think in terms of replacing the query *text* recursively. When I came across the text "recursiveBOM", I replaced it with the subquery text. The resulting mess that I envisioned in my mind looked as follows, with ellipses (...) indicating where further query text expansion would occur:
WITH recursiveBOM
   (assembly_id, assembly_name, parent_assembly) AS
(SELECT parent.assembly_id,
        parent.assembly_name,
        parent.parent_assembly
FROM bill_of_materials parent
WHERE parent.assembly_id=100
UNION ALL
SELECT child.assembly_id,
       child.assembly_name,
       child.parent_assembly
FROM (...) parent, bill_of_materials child
WHERE child.parent_assembly = parent.assembly_id)
SELECT assembly_id, parent_assembly, assembly_name
FROM (SELECT parent.assembly_id,
             parent.assembly_name,
             parent.parent_assembly
FROM bill_of_materials parent
WHERE parent.assembly_id=100
UNION ALL
SELECT child.assembly_id,
       child.assembly_name,
       child.parent_assembly
FROM (...) parent, bill_of_materials child
WHERE child.parent_assembly = parent.assembly_id);
The above is not the way to understand a recursive WITH query. The recursion is not modeled in terms of the text. You have to think in terms of subquery *results*. Following is the mental model I'm currently using to understand this recursive WITH query: 1. The parent query executes:
   SELECT assembly_id, parent_assembly, assembly_name
   FROM recursiveBOM;
This triggers execution of the named subquery. 2 The first query in the subquery's union executes, giving us a seed row with which to begin the recursion:
   SELECT parent.assembly_id,
          parent.assembly_name,
          parent.parent_assembly
   FROM bill_of_materials parent
   WHERE parent.assembly_id=100
The seed row in this case will be for "Automobile". Let's refer to the seed row from here on out as the "new results", new in the sense that we haven't finished processing them yet. 3 The second query in the subquery's union executes:
   SELECT child.assembly_id,
          child.assembly_name,
          child.parent_assembly
   FROM recursiveBOM parent, bill_of_materials child
   WHERE child.parent_assembly = parent.assembly_id
This is where things get interesting. Notice the recursive reference to recursiveBOM. That plays into Step 4. 4. The recursive reference to recursiveBOM is replaced by the query results. And what are the query results? They are: