Bookmark and Share

Comparing data from 2 database tables

Posted: Sunday, March 1st, 2009 at 2:41 amUpdated: Monday, May 25th, 2009 at 5:18 pm

MySQL SQL Code for Comparing 2 Tables

MySQL doesn’t have native support for FULL OUTER JOIN statement. If you read the wikipedia page about full outer join, it is basically a combine of LEFT JOIN and RIGHT JOIN. In terms of data set, it is basically a UNION of LEFT JOIN and RIGHT JOIN.

Going back to our table example, here’s the equivalent solution for MySQL.

SELECT
	*
FROM
	jajal j
	LEFT JOIN jajal_copy jc ON jc.first_name = j.first_name
		AND jc.last_name = j.last_name
		AND jc.grade = j.grade
		AND jc.user_id = j.user_id
WHERE
	jc.user_id IS NULL
UNION ALL
SELECT
	*
FROM
	jajal j
	RIGHT JOIN jajal_copy jc ON jc.first_name = j.first_name
		AND jc.last_name = j.last_name
		AND jc.grade = j.grade
		AND jc.user_id = j.user_id
WHERE
	j.user_id IS NULL

Or if you prefer the NATURAL JOIN version,

SELECT
	*
FROM
	jajal j
	NATURAL LEFT JOIN jajal_copy jc
WHERE
	jc.user_id IS NULL
UNION ALL
SELECT
	*
FROM
	jajal j
	NATURAL RIGHT JOIN jajal_copy jc
WHERE
	j.user_id IS NULL

It may also be a good idea to have a stored procedure defined so instead of typing such a long query (especially since MySQL doesn’t support FULL OUTER JOIN), I can simply execute statement like this:

call tablediff('db_schema', 'jajal', 'jajal_copy')

FULL OUTER JOIN version of tablediff stored procedure

The challange of having FULL OUTER JOIN in a stored procedure is that you have to dynamically create a SQL statement to construct a join condition on all columns. Fortunately, MySQL has a special database called information_schema. We can actually query it and try to construct dynamically generated SQL statements within MySQL stored procedure.

So without further due, here’s what I come up with.

-- Written by Maresa Nirwan
-- www.microshell.com

DELIMITER $$

CREATE PROCEDURE `db_schema`.`tablediff`
	(schema_name VARCHAR(64), table1 VARCHAR(64), table2 VARCHAR(64))
BEGIN
	DECLARE done INT DEFAULT 0;
	DECLARE sql_statement TEXT DEFAULT '';
	DECLARE sql_statement_where TEXT DEFAULT '';
	DECLARE sql_statement_pk TEXT DEFAULT '';
	DECLARE col_name VARCHAR(64);
	DECLARE col_name_cur CURSOR FOR 
		SELECT
			COLUMN_NAME
		FROM
			information_schema.COLUMNS
		WHERE
			TABLE_SCHEMA = schema_name
			AND TABLE_NAME = table1
	;
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

	OPEN col_name_cur;
	traverse_columns: LOOP
		FETCH col_name_cur INTO col_name;

		IF done THEN
			CLOSE col_name_cur;
			LEAVE traverse_columns;
		END IF;

		SET sql_statement_where = CONCAT(sql_statement_where,
			' AND a.', col_name, ' = b.', col_name);
		SET sql_statement_pk = CONCAT(sql_statement_pk,
			'AND b.', col_name, ' IS NULL');
	END LOOP;

	SELECT
		COLUMN_NAME INTO col_name
	FROM
		information_schema.KEY_COLUMN_USAGE
	WHERE
		CONSTRAINT_SCHEMA = schema_name
		AND CONSTRAINT_NAME = 'PRIMARY'
		AND TABLE_NAME = table1
	LIMIT 1
	;
	IF col_name IS NOT NULL THEN
		SET sql_statement_pk = CONCAT('AND b.', col_name, ' IS NULL');
	END IF;

	SET sql_statement = CONCAT('SELECT * FROM ', schema_name, '.', table1, ' a LEFT JOIN ', schema_name, '.', table2, ' b ON TRUE');
	SET sql_statement = CONCAT(sql_statement, sql_statement_where, ' WHERE TRUE ', sql_statement_pk);
	SET sql_statement = CONCAT(sql_statement, ' UNION ALL SELECT * FROM ', schema_name, '.', table1, ' b RIGHT JOIN ', schema_name, '.', table2, ' a ON TRUE');
	SET sql_statement = CONCAT(sql_statement, sql_statement_where, ' WHERE TRUE ', sql_statement_pk);

	SET @s = sql_statement;
	PREPARE stmt1 FROM @s;
	EXECUTE stmt1;
	DEALLOCATE PREPARE stmt1; 

END$$
DELIMITER ;

Explaining FULL OUTER JOIN version of tablediff stored procedure

I think the code is pretty simple. It starts with stored procedure declaration that takes 3 parameters: database name and 2 tables to be compared. My assumption is that the 2 tables that you want to compare lives on the same database. One possible improvement to this stored procedure is that you may want to add a 4th parameter to specify where the other table lives.

Line 14 – 22 is where I query information_schema to get table meta data. Notice that I also assume that the 2nd table has exactly the same structure as the first table; therefore, I didn’t bother to check whether both tables have the same structure. If you want to make the function more robust, this is a possible improvement.

Line 25 – 38 is where I fetch data from information_schema using cursor. It may also be a good example for using cursor in MySQL stored procedure. There’s nothing crazy here. Simply iterate through the result set and construct the join condition string.

Line 40 – 49 is where I’m constructing string for checking NULL column values for the WHERE condition. I figure that primary key can never be null-able, so checking for null primary key is a pretty good indication that the row has no match. A table may have composite primary key. I don’t think it matters if you check on the composite primary key columns or just one of the columns. The basic stays: primary key can’t be null. Therefore, even if you have composite primary key, checking just one column that makes up the primary key should be sufficient. Right now, I assume that all tables have primary key defined. There’s a possible improvement where you can check for tables with no primary key and construct the WHERE condition on all columns as null.

Line 54 – 62 is basically composing a SQL statement string to be executed by means of prepared statement.

I did not have the need for NATURAL FULL OUTER JOIN version and thus I did not have that version. I’d imagine that it should be pretty easy (I think even simpler / less code) than the OUTER JOIN version above. Next, I’d like to have similar stored procedure defined in PostgreSQL.

Pages: 1 2 3

6 Responses to “Comparing data from 2 database tables”

  1. Fernando Vezzosi Says:

    Maatkit has mk-table-sync that can be used for diffing tables:

    http://www.maatkit.org/doc/mk-table-sync.html

  2. Ted Says:

    (SELECT * FROM table1 EXCEPT SELECT * FROM table2)
    UNION
    (SELECT * FROM table2 EXCEPT SELECT * FROM table1)

  3. Maresa Says:

    Thank you Ted. I didn’t know there’s EXCEPT clause on PostgreSQL SELECT statement.

  4. Michael Says:

    for some reason this article has just appeared in the planetpostgres twitter feed… in Postgresql there is also the option of http://pgdba.net/pg51g … yes, shameless self-promotion this is, but anyone interested in your article may also be interested in pg51g. cheers

  5. Farid Z Says:

    Easiest way is to use a free database table differencing tool such as CompareData

    Using already packaged/developed tool you can also compare the data where the two tables live on different servers, even DBMSs, since in that case you will not be able to do a join.

  6. manoj Says:

    minus or except operators are better than this

    select * from jajal
    minus
    select * from jajal_copy –works in oracle

Leave a Reply

Time limit is exhausted. Please reload the CAPTCHA.