Wednesday, April 09, 2008

Relational algebra: division in sql

There was a question in one of the Turkish Oracle mailing lists which got me interested. The question simply was:

I have a table holding the parts of a product and another table holding the suppliers of these parts. How can I find the suppliers which supply all the parts?

Someone suggested using the division operation of relational algebra but did not provide how to do it in Oracle. So I started with the Wikipedia link he provided to solve the problem in sql.

Here are the tables:


SQL> create table parts (pid number);

Table created.

SQL> create table catalog (sid number,pid number);

Table created.

SQL> insert into parts select rownum from all_objects where rownum<=5;

5 rows created.

SQL> insert into catalog values (10,1);

1 row created.

SQL> insert into catalog select 1,pid from parts;

5 rows created.

SQL> select * from catalog;

SID PID
---------- ----------
10 1
1 1
1 2
1 3
1 4
1 5


So, the supplier which has all the parts is 1. How do we find that?

The division in relational algebra is done by some steps which are explained in the link. Let's follow those:

The simulation of the division with the basic operations is as follows. We assume that a1,...,an are the attribute names unique to R and b1,...,bm are the attribute names of S. In the first step we project R on its unique attribute names and construct all combinations with tuples in S:

T := πa1,...,an(R) × S

In our case we have the table CATALOG as R, the table PARTS as S. So if we write a sql to find out the above relation we need a cartesian join.

select sid,pid
from (select sid from catalog) ,parts
;


This give us all suppliers combined with all parts.

SID PID
---------- ----------
10 1
1 1
1 1
1 1
1 1
1 1
10 2
1 2
1 2
1 2
1 2

SID PID
---------- ----------
1 2
10 3
1 3
1 3
1 3
1 3
1 3
10 4
1 4
1 4
1 4

SID PID
---------- ----------
1 4
1 4
10 5
1 5
1 5
1 5
1 5
1 5

We now have all the possibilities for the supplier-part relation.

The second step is:

In the next step we subtract R from this relation:

U := T - R

To subtract the table CATALOG we need the MINUS operator.

select sid,pid
from (select sid from catalog) ,parts
minus
select sid,pid from catalog;

SID PID
---------- ----------
10 2
10 3
10 4
10 5





After we had all the possibilities we subtracted the ones which are already in the CATALOG table and we got the ones which are not present in the table.

On to the next step:

Note that in U we have the possible combinations that "could have" been in R, but weren't. So if we now take the projection on the attribute names unique to R then we have the restrictions of the tuples in R for which not all combinations with tuples in S were present in R:

V := πa1,...,an(U)

This step just gets the supplier id's from the previous query.

select sid from (
select sid,pid
from (select sid from catalog) ,parts
minus
select sid,pid from catalog
);

SID
----------
10
10
10
10



Now we have the supplier id which does not supply all of the parts. The next step is obvious, to find the other suppliers (the remaining ones supply all the parts).

So what remains to be done is take the projection of R on its unique attribute names and subtract those in V:

W := πa1,...,an(R) - V

To do this we need to subtract the previous query from the table CATALOG.

select sid from catalog
minus
select sid from (
select sid,pid
from (select sid from catalog) ,parts
minus
select sid,pid from catalog
);

SID
----------
1


In some pages (like this one) the same operation is done using a different sql (obtained by using "not exists" instead of "minus").

select distinct sid from catalog c1
where not exists (
select null from parts p
where not exists (select null from catalog where pid=p.pid and c1.sid=sid));


This one is harder for me to understand in the first look, I am more comfortable following the steps above.

It is great to follow the steps of relational algebra to solve a problem in sql. It helps very much in understanding the solution. Relational algebra rocks!