During development cycle, I often come to a case where I need to compare two tables whether they are the same or not. For example, if you already have a running script that outputs to a database table in production that was written sub-optimally and you need to improve it. You want to make sure that your improvement version produces not only a faster and more efficient running time, but also provides correct result. One way to do this is to have the new script write into different table and then compare the tables.
What we are working towards
Depending on whether you need to do this often and on different tables, it may be a good idea to create a simple stored procedure that we can simply call passing 2 different table names and returns the data difference. I’ll attempt to have the stored procedure defined for MySQL on page 2 of this article and PostgreSQL on page 3.
For the purpose of this article, I’ll be using 2 tables structured as below with table names jajal and jajal_copy:
CREATE TABLE jajal ( user_id integer NOT NULL, first_name character varying(255), last_name character varying(255), grade character(1), CONSTRAINT jajal_pkey PRIMARY KEY (user_id) )
Contents of table jajal
Contents of table jajal_copy
As you can see, Mixed Nuts has different data on jajal (grade is Z) and on jajal_copy (grade is C). The goal is to get this difference.
The whole idea of comparing something vs something else is, I think, to find the difference. In the case of comparing 2 database tables, we can use outer join to get the table difference. I really encourage you to visit the wikipedia link if you don’t understand outer join. It is a pre-requisite to understanding what we’re about to do.
Using FULL OUTER JOIN
So the nature of FULL OUTER JOIN is to return all records of both tables, even if there’s no match. We can then add a WHERE condition to ensure that we get only the non-matched rows. For our table example above, we want to match all columns for equality. Matching all columns ensures that all the data within the rows are matched. Here’s the example query for getting the non-matching rows from our table example.
SELECT * FROM jajal j FULL OUTER JOIN jajal_copy jc ON jc.first_name = j.first_name AND jc.last_name = j.last_name AND jc.grade = j.grade AND jc.user_id = j.user_id WHERE j.user_id IS NULL OR jc.user_id IS NULL
Outer Join Result
Using NATURAL FULL OUTER JOIN
There’s also another type of SQL join called natural join. Do glance through the wikipedia link for understanding what it is.
So here’s the NATURAL FULL OUTER JOIN version of getting the table difference.
SELECT * FROM jajal j NATURAL FULL OUTER JOIN jajal_copy jc WHERE j.user_id IS NULL OR jc.user_id IS NULL
Natural outer join result
As you can see, NATURAL FULL OUTER JOIN version is much less typing, especially if you have a lot of columns. However, the result is also slightly different. On FULL OUTER JOIN version, you can tell which rows come from which tables. Depending on your needs, one of the solutions above may fit your needs.
On page 2, I’m going to talk on MySQL specific solution. On Page 3, I’ll discuss PostgreSQL specific solution.