Bookmark and Share

Comparing data from 2 database tables

Posted: Sunday, March 1st, 2009 at 2:41 amUpdated: Monday, May 25th, 2009 at 5:18 pm

During development cycle, I often come to a case where I need to compare two tables whether they are the same or not. For example, if you already have a running script that outputs to a database table in production that was written sub-optimally and you need to improve it. You want to make sure that your improvement version produces not only a faster and more efficient running time, but also provides correct result. One way to do this is to have the new script write into different table and then compare the tables.

What we are working towards

Depending on whether you need to do this often and on different tables, it may be a good idea to create a simple stored procedure that we can simply call passing 2 different table names and returns the data difference. I’ll attempt to have the stored procedure defined for MySQL on page 2 of this article and PostgreSQL on page 3.

For the purpose of this article, I’ll be using 2 tables structured as below with table names jajal and jajal_copy:

CREATE TABLE jajal
(
	user_id integer NOT NULL,
	first_name character varying(255),
	last_name character varying(255),
	grade character(1),
	CONSTRAINT jajal_pkey PRIMARY KEY (user_id)
) 

Contents of table jajal

user_id first_name last_name grade
1 Some Dude A
2 Other Guy B
3 You are Welcome B
4 What Other A
5 INeed You C
6 Mixed Nuts Z
7 Kirk Land B
8 Bit Shooter A
9 Sun Microsystem C
10 Extra Fancy B

Contents of table jajal_copy

user_id first_name last_name grade
1 Some Dude A
2 Other Guy B
3 You are Welcome B
4 What Other A
5 INeed You C
6 Mixed Nuts C
7 Kirk Land B
8 Bit Shooter A
9 Sun Microsystem C
10 Extra Fancy B

As you can see, Mixed Nuts has different data on jajal (grade is Z) and on jajal_copy (grade is C). The goal is to get this difference.

The concept

The whole idea of comparing something vs something else is, I think, to find the difference. In the case of comparing 2 database tables, we can use outer join to get the table difference. I really encourage you to visit the wikipedia link if you don’t understand outer join. It is a pre-requisite to understanding what we’re about to do.

Using FULL OUTER JOIN

So the nature of FULL OUTER JOIN is to return all records of both tables, even if there’s no match. We can then add a WHERE condition to ensure that we get only the non-matched rows. For our table example above, we want to match all columns for equality. Matching all columns ensures that all the data within the rows are matched. Here’s the example query for getting the non-matching rows from our table example.

SELECT
	*
FROM
	jajal j
	FULL OUTER JOIN jajal_copy jc ON jc.first_name = j.first_name
		AND jc.last_name = j.last_name
		AND jc.grade = j.grade
		AND jc.user_id = j.user_id
WHERE
	j.user_id IS NULL
	OR jc.user_id IS NULL

Outer Join Result

user_id first_name last_name grade user_id first_name last_name grade
[NULL] [NULL] [NULL] [NULL] 6 Mixed Nuts C
6 Mixed Nuts Z [NULL] [NULL] [NULL] [NULL]

Using NATURAL FULL OUTER JOIN

There’s also another type of SQL join called natural join. Do glance through the wikipedia link for understanding what it is.

So here’s the NATURAL FULL OUTER JOIN version of getting the table difference.

SELECT
	*
FROM
	jajal j
	NATURAL FULL OUTER JOIN jajal_copy jc
WHERE
	j.user_id IS NULL
	OR jc.user_id IS NULL

Natural outer join result

user_id first_name last_name grade
6 Mixed Nuts C
6 Mixed Nuts Z

As you can see, NATURAL FULL OUTER JOIN version is much less typing, especially if you have a lot of columns. However, the result is also slightly different. On FULL OUTER JOIN version, you can tell which rows come from which tables. Depending on your needs, one of the solutions above may fit your needs.

On page 2, I’m going to talk on MySQL specific solution. On Page 3, I’ll discuss PostgreSQL specific solution.

Pages: 1 2 3

6 Responses to “Comparing data from 2 database tables”

  1. Fernando Vezzosi Says:

    Maatkit has mk-table-sync that can be used for diffing tables:

    http://www.maatkit.org/doc/mk-table-sync.html

  2. Ted Says:

    (SELECT * FROM table1 EXCEPT SELECT * FROM table2)
    UNION
    (SELECT * FROM table2 EXCEPT SELECT * FROM table1)

  3. Maresa Says:

    Thank you Ted. I didn’t know there’s EXCEPT clause on PostgreSQL SELECT statement.

  4. Michael Says:

    for some reason this article has just appeared in the planetpostgres twitter feed… in Postgresql there is also the option of http://pgdba.net/pg51g … yes, shameless self-promotion this is, but anyone interested in your article may also be interested in pg51g. cheers

  5. Farid Z Says:

    Easiest way is to use a free database table differencing tool such as CompareData

    Using already packaged/developed tool you can also compare the data where the two tables live on different servers, even DBMSs, since in that case you will not be able to do a join.

  6. manoj Says:

    minus or except operators are better than this

    select * from jajal
    minus
    select * from jajal_copy –works in oracle

Leave a Reply