Something is wrong with the following query that joins three tables. Can you spot the problem?
SELECT S.SNO, S.SNAME, S.CITY, P.PNO, P.PNAME FROM SUPPLIER S, PART P, SUPPLIER_PART SP WHERE S.SNO = SP.SNO AND P.PNO = SP.SNO;
If you've been around the industry for awhile, you'll doubtless recognize this query as coming from Chris Date's classic, supplier-part database example. And the problem? Have you spotted it yet? The problem lies in the last line:
AND P.PNO = SP.SNO;
The problem is that I've bungled the join by comparing part number (PNO) to supplier number (SNO). Bummer! Worse yet, my database merrily went along executing the query, sending me back rows that made no sense, except that it looked like they made sense. Those rows printed on an inventory report that I handed in to my boss, who in turn took all the wrong actions, because, of course, the data was all muddled, resulting in a plethora of part shortages and angry suppliers, and, well, now I'm out of a job.
Ok. I'm not really out of a job. I was only kidding about that, but you can begin to see how a simple, little typo can lead to some majorly bad consequences. Shouldn't database systems protect us from such blunders? Perhaps they should. Chris Date has been arguing for years (decades?) that relational database systems should allow us to specify column datatypes in way that prevents such inadvertent comparisons of part numbers to supplier numbers. Such functionality has been in the relational model from the beginning in the form of domains.
The term domain is essentially synonymous with datatype. In a program, say a C++ program, you might declare separate types for supplier number and part number. Your compiler then, would throw an error in the event that you tried to use a value of one type in place of the other, unless you made an explicit cast from the one type to the other. Strict type checking in a programming language can prevent many errors when writing programs. It can prevent errors when working with a database too, and the intent of Ted Codd, Chris Date, and other early proponents of the relational model has always been to make such type checking available to practitioners.
As you can well guess by now, the reason the preceding query executed at all is that both PNO and SNO are defined as NUMBER columns. It's perfectly fine to compare NUMBER to NUMBER. The database system doesn't care, nor does it know, that the two numbers represent two, completely different things. If I told you it was 0 degrees outside and that I was 7 feet tall, you would never compare those two numbers to each other; it would be nonsense to do so. The meaning of 0 is that it's cold outside, while the meaning of 7 is that I might bang my head while walking through a doorway.
But what if the database system could know? In thinking about this issue recently, it dawned on me that Oracle's user-defined types (sometimes referred to as object types) can be put to use on this problem, and that it may be possible in Oracle to implement distinct types for PNO and SNO in a manner that provides the domain functionality of the relational model, that Chris has been advocating for so many years.
Please think of this article as an exploration. I applied user-defined types to the supplier-part example to see what sort of issues I would encounter. And that's all. I'm not at the point of recommending the approach yet.
The Conventional Approach
To begin, let's create the supplier-part tables in the conventional way, using the conventional, system-defined datatypes:
CREATE TABLE supplier ( sno NUMBER, sname VARCHAR2(10), status NUMBER, city VARCHAR2(10)); CREATE TABLE part ( pno NUMBER, pname VARCHAR2(10), color VARCHAR2(5), weight NUMBER, city VARCHAR2(10)); CREATE TABLE supplier_part ( sno NUMBER, pno NUMBER, qty NUMBER);
And now, let's insert some example data:
INSERT INTO supplier VALUES (1,'Smith',20,'London'); INSERT INTO supplier VALUES (2,'Jones',10,'Paris'); INSERT INTO part VALUES (1,'Nut','Red',12,'London'); INSERT INTO part VALUES (2,'Bolt','Green',17,'Paris'); INSERT INTO supplier_part VALUES (1,2,200); INSERT INTO supplier_part VALUES (2,1,300);
There. That should be enough data for this article. Following is the correct query (and results) to report on part inventories:
SELECT S.SNO, S.SNAME, S.CITY, P.PNO, P.PNAME FROM SUPPLIER S, PART P, SUPPLIER_PART SP WHERE S.SNO = SP.SNO AND P.PNO = SP.PNO; SNO SNAME CITY PNO PNAME ---------- ---------- ---------- ---------- ---------- 2 Jones Paris 1 Nut 1 Smith London 2 Bolt
Now, have a look at the query and results that got me fired:
SELECT S.SNO, S.SNAME, S.CITY, P.PNO, P.PNAME FROM SUPPLIER S, PART P, SUPPLIER_PART SP WHERE S.SNO = SP.SNO AND P.PNO = SP.SNO; SNO SNAME CITY PNO PNAME ---------- ---------- ---------- ---------- ---------- 1 Smith London 1 Nut 2 Jones Paris 2 Bolt
Don't these results just look fine? They're very plausible. Yet they're very wrong. In fact, in this particular case, everything is backwards. Smith doesn't have nuts, he's got bolts!
A New Way?
Now let's try creating a new set of tables, which I'll call SUPPLIER2, PART2, AND SUPPLIER_PART2, and this time let's first create some user-defined types for part numbers, supplier numbers, and, just to make things interesting, colors. By the way, this isn't really a "new" way, or at least it shouldn't be. The concept of user-defined datatypes has been part of the relational model from the very beginning.
CREATE TYPE part_number AS OBJECT ( pno NUMBER); / CREATE TYPE supplier_number AS OBJECT ( sno NUMBER); /
I had a bit of trouble with the color type. I wanted to use a constructor to ensure that only valid colors were used, like so:
CREATE TYPE color AS OBJECT ( valid_color VARCHAR2(5), CONSTRUCTOR FUNCTION color (color IN VARCHAR2) RETURN SELF AS RESULT); / CREATE TYPE BODY color AS CONSTRUCTOR FUNCTION color (color IN VARCHAR2) RETURN SELF AS RESULT IS BEGIN IF color IN ('Red','Green') THEN SELF.valid_color := color; RETURN; ELSE RAISE_APPLICATION_ERROR( -20000,'Bad Color!'); END IF; END; END; /
You can create this type and body, but you'll get an error when you attempt to invoke the color constructor, because there is the default constructor and also the one defined in the type body, and both constructor functions have the same name as well as the same number and type of arguments. The bottom-line is that you simply cannot enforce any integrity rules through constructors, so you might as well not even try. Instead, we'll define color along the same lines as supplier_number and part_number:
CREATE TYPE color AS OBJECT ( color VARCHAR2(5)); /
Next up is the task of creating a new set of supplier-part tables based, in part, on the three types just created:
CREATE TABLE supplier2 ( sno supplier_number, sname VARCHAR2(10), status NUMBER, city VARCHAR2(10)); CREATE TABLE part2 ( pno part_number, pname VARCHAR2(10), color color, weight NUMBER, city VARCHAR2(10)); CREATE TABLE supplier_part2 ( sno supplier_number, pno part_number, qty NUMBER);
In these tables, the sno and pno columns are of type supplier_number and part_number respectively. The color column is of type color.
Inserting Some Data
The interesting task now is to insert our data into this new set of tables. To that end, we have to invoke the constructor functions, which increases the complexity of the INSERT statements just a bit:
INSERT INTO supplier2 VALUES ( supplier_number(1),'Smith',20,'London'); INSERT INTO supplier2 VALUES ( supplier_number(2),'Jones',10,'Paris'); INSERT INTO part2 VALUES (part_number(1),'Nut', color('Red'),12,'London'); INSERT INTO part2 VALUES (part_number(2),'Bolt', color('Green'),17,'Paris'); INSERT INTO supplier_part2 VALUES ( supplier_number(1), part_number(2),200); INSERT INTO supplier_part2 VALUES ( supplier_number(2), part_number(1),300);
One way to look at constructor invocations such as supplier_number(1), in this context at least, is to see them as type conversion functions, for example converting the raw number 1 to the supplier number 1.
Querying the Data
Before going further, let's have a peek at the data in the supplier2 table, just to see what we have to work with:
SELECT sno, sname, status, city FROM supplier2; SNO(SNO) SNAME STATUS CITY -------------------- ---------- ---------- ---------- SUPPLIER_NUMBER(1) Smith 20 London SUPPLIER_NUMBER(2) Jones 10 Paris
Notice the notation used for values from SNO. The SNO values returned by this query are of type supplier_number. If we were querying the table from an application program, we might not want to deal with results that include user-defined types. One way to get at the underlying, numeric, supplier number values is to use dot-notation, as follows:
SELECT s2.sno.sno, sname FROM supplier2 s2; SNO.SNO SNAME ---------- ---------- 1 Smith 2 Jones
The use of a table alias, such as S2 in this example, is necessary when you wish to directly access attributes of a user-defined type. Accessing attributes directly is not at all the ideal situation. Better would be write a type body that included a TO_NUMBER conversion method allowing you to write, for example:
SELECT s2.sno.to_number, sname FROM supplier2 s2;
I'll come back to this issue of type conversion later, in my conclusion.
Type Safe Comparisons
Let's revisit that problem table join from the beginning of this article. First, let's try the correct version of that join, comparing supplier number to supplier number, and part number to part number:
SELECT S2.SNO, S2.SNAME, S2.CITY, P2.PNO, P2.PNAME FROM SUPPLIER2 S2, PART2 P2, SUPPLIER_PART2 SP2 WHERE S2.SNO = SP2.SNO AND P2.PNO = SP2.PNO; SNO(SNO) SNAME CITY PNO(PNO) PNAME -------------------- ---------- ---------- ---------------- ---------- SUPPLIER_NUMBER(2) Jones Paris PART_NUMBER(1) Nut SUPPLIER_NUMBER(1) Smith London PART_NUMBER(2) Bolt
These results are correct. The next test, however, should be more interesting. Let's try the query again, but this time with the typo that cost me my job. This time let's "accidentally" compare part number to supplier number in the last line of the query:
SELECT S2.SNO, S2.SNAME, S2.CITY, P2.PNO, P2.PNAME FROM SUPPLIER2 S2, PART2 P2, SUPPLIER_PART2 SP2 WHERE S2.SNO = SP2.SNO AND P2.PNO = SP2.SNO; AND P2.PNO = SP2.SNO * ERROR at line 4: ORA-00932: inconsistent datatypes: expected GENNICK.PART_NUMBER got GENNICK.SUPPLIER_NUMBER
Isn't this great? The database has saved us. It's taken a bit of extra work on our part, but, by creating those user-defined types we've prevented erroneous comparisons between part and supplier numbers. Maybe I can get my job back :-)
ORDER BY and Indexing
We need to be able to do more than just select and compare column values. Sorting and indexing are two, closely related tasks that are essential in almost all databases. To enable sorting by supplier number, we can create a MAP function that returns a number by which a given supplier_number value can be sorted. The easiest and most obvious implementation is to simply return the value of the supplier_number's sno attribute:
ALTER TYPE supplier_number ADD MAP MEMBER FUNCTION sno_map RETURN NUMBER CASCADE INCLUDING TABLE DATA; CREATE TYPE BODY supplier_number AS MAP MEMBER FUNCTION sno_map RETURN NUMBER IS BEGIN RETURN SELF.sno; END; END; /
This new MAP function enables the use of ORDER BY in our queries:
SELECT * FROM supplier2 ORDER BY SNO DESC; SNO(SNO) SNAME STATUS CITY -------------------- ---------- ---------- ---------- SUPPLIER_NUMBER(2) Jones 10 Paris SUPPLIER_NUMBER(1) Smith 20 London
I had hoped that MAP would enable the creation of an index on an object column:
CREATE INDEX sno_index ON supplier2 (sno);
Alas, this statement will fail with an ORA-02327 error. You cannot index object types. You can, however, index the attribute of a user-defined type:
CREATE INDEX sno_index ON supplier2 (sno.sno);
But this really does us little good, because to make use of this index we must give up the supplier_number abstraction and resort to writing WHERE clause predicates that access the underlying attribute values, which puts us right back to getting fired again.
Primary and Foreign Keys
Since primary keys in Oracle imply indexes, it follows that we cannot define primary keys that include user-defined types. And without primary keys, we can't have foreign keys. Yet we can create primary keys on attribute values:
ALTER TABLE supplier2 ADD ( CONSTRAINT supplier_pk PRIMARY KEY (sno.sno));
And now we are protected from entering two suppliers having the same number:
INSERT INTO supplier2 VALUES ( supplier_number(2), 'Jenny',30,'Cedar Lake'); INSERT INTO supplier2 VALUES ( * ERROR at line 1: ORA-00001: unique constraint (GENNICK.SUPPLIER_PK) violated
Why does it not bother me to define a primary key on an attribute while it does bother me to create an index on an attribute? I can accept attribute values in primary keys because such primary keys do not, in turn, force you to expose those same attributes in your SQL statements. In the INSERT statement that just failed, I was able to stay within the abstraction provided by supplier_type. I was able to insert a supplier_type value.
Foreign key constraints may be created in much the same manner as primary key constraints:
ALTER TABLE supplier_part2 ADD ( CONSTRAINT has_supplier FOREIGN KEY (sno.sno) REFERENCES supplier2);
This key definition requires that the sno attribute of the supplier_number in supplier_part2 must match an sno attribute of a supplier_number in supplier2.
The questions I've been exploring in this article are twofold:
Can user-defined types, as implemented in Oracle be used as a vehicle for implementing domains as described in the relational model?
Is such use of user-defined types practical?
Oracle is tantalizingly(!) close when it comes to creating your own types to use for simple, scalar columns such as for supplier and part numbers. Yet I'm not ready to recommend the technique for "prime time". Some of the issues encountered just in the simple experiment described in this article give me pause:
I'm bothered that attributes cannot be marked as private. How do you prevent an invalid color, say 'eeks', from being assigned to a color attribute? You cannot assign anything other than a number to a NUMBER column. Why should you be able to assign something other than a color to a color column?
Public attributes present a problem when going the other way too. We can add a to_number method to supplier_number, so that sno.to_number returns a raw number, but users can easily go around that method by accessing the underlying attribute. Thus, there would be two ways to get a raw number from a supplier_number value, and there should only ever be one.
There is no provision for defining implicit conversions. You may be opposed to those, but Oracle allows implicit conversions between many built-in types, and, for consistency's sake, one could argue that Oracle should allow for us to define implicit conversions for user-defined types. To that end, I can envision syntax like:
ALTER TYPE supplier_number ADD CONVERSION FUNCTION to_part_number RETURN part_number; /
In this syntax, which I've just dreamed up tonight, the keyword CONVERSION identifies the function as one that converts supplier_number to some other type. That target type is then identified in the RETURN clause. Now, when faced with a comparison such as sno = pno, the database can implicitly invoke sno.to_part_number on our behalf. (The irony here is that this particular example puts us right back into the problem that began this article).
Another need for conversion functions such as I describe above is for generic programs such as SQL*Plus that must be able to display any datatype.
The inability to index columns defined on a user-defined type is a big problem when it comes to using such types for simple, everyday values such as supplier number and part number. I had hoped that the MAP function would enable an index, with the MAP values being used as the basis for indexing.
Some optimizations are probably possible, to make the use of user-defined types easier. The ability to specify a MAP attribute, rather than having to write a MAP function, would certainly be convenient.
I hope you've found this little exploration as intriguing as I have. If you have any thoughts on what I've talked about in this article, and on whether there's any merit at all in the approach I've described, I'd be very interested in hearing from you. And if you happen to be using an approach such as I've described, then please write, because I'll want to know more.
Many thanks to Fabian Pascal for his early critique of this article. He suggested several changes that have improved its clarity. Fabian runs a website, http://www.dbdebunk.com, hosting a number of papers and short articles of interest to database practitioners.
Thanks too, to Chris Date and Anthony Molinaro who also reviewed drafts of this article.
Inspiration for this article came from reading an early draft chapter from Chris Date's book Database In Depth. I am most honored to have been the editor of said book.