During development cycle, I often come to a case where I need to compare two tables whether they are the same or not. For example, if you already have a running script that outputs to a database table in production that was written sub-optimally and you need to improve it. You want to make sure that your improvement version produces not only a faster and more efficient running time, but also provides correct result. One way to do this is to have the new script write into different table and then compare the tables.
What we are working towards
Depending on whether you need to do this often and on different tables, it may be a good idea to create a simple stored procedure that we can simply call passing 2 different table names and returns the data difference. I’ll attempt to have the stored procedure defined for MySQL on page 2 of this article and PostgreSQL on page 3.
For the purpose of this article, I’ll be using 2 tables structured as below with table names jajal and jajal_copy:
CREATE TABLE jajal ( user_id integer NOT NULL, first_name character varying(255), last_name character varying(255), grade character(1), CONSTRAINT jajal_pkey PRIMARY KEY (user_id) )
Contents of table jajal
user_id | first_name | last_name | grade |
---|---|---|---|
1 | Some | Dude | A |
2 | Other | Guy | B |
3 | You are | Welcome | B |
4 | What | Other | A |
5 | INeed | You | C |
6 | Mixed | Nuts | Z |
7 | Kirk | Land | B |
8 | Bit | Shooter | A |
9 | Sun | Microsystem | C |
10 | Extra | Fancy | B |
Contents of table jajal_copy
user_id | first_name | last_name | grade |
---|---|---|---|
1 | Some | Dude | A |
2 | Other | Guy | B |
3 | You are | Welcome | B |
4 | What | Other | A |
5 | INeed | You | C |
6 | Mixed | Nuts | C |
7 | Kirk | Land | B |
8 | Bit | Shooter | A |
9 | Sun | Microsystem | C |
10 | Extra | Fancy | B |
As you can see, Mixed Nuts has different data on jajal (grade is Z) and on jajal_copy (grade is C). The goal is to get this difference.
The concept
The whole idea of comparing something vs something else is, I think, to find the difference. In the case of comparing 2 database tables, we can use outer join to get the table difference. I really encourage you to visit the wikipedia link if you don’t understand outer join. It is a pre-requisite to understanding what we’re about to do.
Using FULL OUTER JOIN
So the nature of FULL OUTER JOIN is to return all records of both tables, even if there’s no match. We can then add a WHERE condition to ensure that we get only the non-matched rows. For our table example above, we want to match all columns for equality. Matching all columns ensures that all the data within the rows are matched. Here’s the example query for getting the non-matching rows from our table example.
SELECT * FROM jajal j FULL OUTER JOIN jajal_copy jc ON jc.first_name = j.first_name AND jc.last_name = j.last_name AND jc.grade = j.grade AND jc.user_id = j.user_id WHERE j.user_id IS NULL OR jc.user_id IS NULL
Outer Join Result
user_id | first_name | last_name | grade | user_id | first_name | last_name | grade |
---|---|---|---|---|---|---|---|
[NULL] | [NULL] | [NULL] | [NULL] | 6 | Mixed | Nuts | C |
6 | Mixed | Nuts | Z | [NULL] | [NULL] | [NULL] | [NULL] |
Using NATURAL FULL OUTER JOIN
There’s also another type of SQL join called natural join. Do glance through the wikipedia link for understanding what it is.
So here’s the NATURAL FULL OUTER JOIN version of getting the table difference.
SELECT * FROM jajal j NATURAL FULL OUTER JOIN jajal_copy jc WHERE j.user_id IS NULL OR jc.user_id IS NULL
Natural outer join result
user_id | first_name | last_name | grade |
---|---|---|---|
6 | Mixed | Nuts | C |
6 | Mixed | Nuts | Z |
As you can see, NATURAL FULL OUTER JOIN version is much less typing, especially if you have a lot of columns. However, the result is also slightly different. On FULL OUTER JOIN version, you can tell which rows come from which tables. Depending on your needs, one of the solutions above may fit your needs.
On page 2, I’m going to talk on MySQL specific solution. On Page 3, I’ll discuss PostgreSQL specific solution.
Maatkit has mk-table-sync that can be used for diffing tables:
http://www.maatkit.org/doc/mk-table-sync.html
(SELECT * FROM table1 EXCEPT SELECT * FROM table2)
UNION
(SELECT * FROM table2 EXCEPT SELECT * FROM table1)
Thank you Ted. I didn’t know there’s EXCEPT clause on PostgreSQL SELECT statement.
for some reason this article has just appeared in the planetpostgres twitter feed… in Postgresql there is also the option of http://pgdba.net/pg51g … yes, shameless self-promotion this is, but anyone interested in your article may also be interested in pg51g. cheers
Easiest way is to use a free database table differencing tool such as CompareData
Using already packaged/developed tool you can also compare the data where the two tables live on different servers, even DBMSs, since in that case you will not be able to do a join.
minus or except operators are better than this
select * from jajal
minus
select * from jajal_copy –works in oracle