MySQL SQL Code for Comparing 2 Tables
MySQL doesn’t have native support for FULL OUTER JOIN statement. If you read the wikipedia page about full outer join, it is basically a combine of LEFT JOIN and RIGHT JOIN. In terms of data set, it is basically a UNION of LEFT JOIN and RIGHT JOIN.
Going back to our table example, here’s the equivalent solution for MySQL.
SELECT * FROM jajal j LEFT JOIN jajal_copy jc ON jc.first_name = j.first_name AND jc.last_name = j.last_name AND jc.grade = j.grade AND jc.user_id = j.user_id WHERE jc.user_id IS NULL UNION ALL SELECT * FROM jajal j RIGHT JOIN jajal_copy jc ON jc.first_name = j.first_name AND jc.last_name = j.last_name AND jc.grade = j.grade AND jc.user_id = j.user_id WHERE j.user_id IS NULL
Or if you prefer the NATURAL JOIN version,
SELECT * FROM jajal j NATURAL LEFT JOIN jajal_copy jc WHERE jc.user_id IS NULL UNION ALL SELECT * FROM jajal j NATURAL RIGHT JOIN jajal_copy jc WHERE j.user_id IS NULL
It may also be a good idea to have a stored procedure defined so instead of typing such a long query (especially since MySQL doesn’t support FULL OUTER JOIN), I can simply execute statement like this:
call tablediff('db_schema', 'jajal', 'jajal_copy')
FULL OUTER JOIN version of tablediff stored procedure
The challange of having FULL OUTER JOIN in a stored procedure is that you have to dynamically create a SQL statement to construct a join condition on all columns. Fortunately, MySQL has a special database called information_schema. We can actually query it and try to construct dynamically generated SQL statements within MySQL stored procedure.
So without further due, here’s what I come up with.
-- Written by Maresa Nirwan
-- www.microshell.com
DELIMITER $$
CREATE PROCEDURE `db_schema`.`tablediff`
(schema_name VARCHAR(64), table1 VARCHAR(64), table2 VARCHAR(64))
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE sql_statement TEXT DEFAULT '';
DECLARE sql_statement_where TEXT DEFAULT '';
DECLARE sql_statement_pk TEXT DEFAULT '';
DECLARE col_name VARCHAR(64);
DECLARE col_name_cur CURSOR FOR
SELECT
COLUMN_NAME
FROM
information_schema.COLUMNS
WHERE
TABLE_SCHEMA = schema_name
AND TABLE_NAME = table1
;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN col_name_cur;
traverse_columns: LOOP
FETCH col_name_cur INTO col_name;
IF done THEN
CLOSE col_name_cur;
LEAVE traverse_columns;
END IF;
SET sql_statement_where = CONCAT(sql_statement_where,
' AND a.', col_name, ' = b.', col_name);
SET sql_statement_pk = CONCAT(sql_statement_pk,
'AND b.', col_name, ' IS NULL');
END LOOP;
SELECT
COLUMN_NAME INTO col_name
FROM
information_schema.KEY_COLUMN_USAGE
WHERE
CONSTRAINT_SCHEMA = schema_name
AND CONSTRAINT_NAME = 'PRIMARY'
AND TABLE_NAME = table1
LIMIT 1
;
IF col_name IS NOT NULL THEN
SET sql_statement_pk = CONCAT('AND b.', col_name, ' IS NULL');
END IF;
SET sql_statement = CONCAT('SELECT * FROM ', schema_name, '.', table1, ' a LEFT JOIN ', schema_name, '.', table2, ' b ON TRUE');
SET sql_statement = CONCAT(sql_statement, sql_statement_where, ' WHERE TRUE ', sql_statement_pk);
SET sql_statement = CONCAT(sql_statement, ' UNION ALL SELECT * FROM ', schema_name, '.', table1, ' b RIGHT JOIN ', schema_name, '.', table2, ' a ON TRUE');
SET sql_statement = CONCAT(sql_statement, sql_statement_where, ' WHERE TRUE ', sql_statement_pk);
SET @s = sql_statement;
PREPARE stmt1 FROM @s;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
END$$
DELIMITER ;
Explaining FULL OUTER JOIN version of tablediff stored procedure
I think the code is pretty simple. It starts with stored procedure declaration that takes 3 parameters: database name and 2 tables to be compared. My assumption is that the 2 tables that you want to compare lives on the same database. One possible improvement to this stored procedure is that you may want to add a 4th parameter to specify where the other table lives.
Line 14 – 22 is where I query information_schema to get table meta data. Notice that I also assume that the 2nd table has exactly the same structure as the first table; therefore, I didn’t bother to check whether both tables have the same structure. If you want to make the function more robust, this is a possible improvement.
Line 25 – 38 is where I fetch data from information_schema using cursor. It may also be a good example for using cursor in MySQL stored procedure. There’s nothing crazy here. Simply iterate through the result set and construct the join condition string.
Line 40 – 49 is where I’m constructing string for checking NULL column values for the WHERE condition. I figure that primary key can never be null-able, so checking for null primary key is a pretty good indication that the row has no match. A table may have composite primary key. I don’t think it matters if you check on the composite primary key columns or just one of the columns. The basic stays: primary key can’t be null. Therefore, even if you have composite primary key, checking just one column that makes up the primary key should be sufficient. Right now, I assume that all tables have primary key defined. There’s a possible improvement where you can check for tables with no primary key and construct the WHERE condition on all columns as null.
Line 54 – 62 is basically composing a SQL statement string to be executed by means of prepared statement.
I did not have the need for NATURAL FULL OUTER JOIN version and thus I did not have that version. I’d imagine that it should be pretty easy (I think even simpler / less code) than the OUTER JOIN version above. Next, I’d like to have similar stored procedure defined in PostgreSQL.





Maatkit has mk-table-sync that can be used for diffing tables:
http://www.maatkit.org/doc/mk-table-sync.html
(SELECT * FROM table1 EXCEPT SELECT * FROM table2)
UNION
(SELECT * FROM table2 EXCEPT SELECT * FROM table1)
Thank you Ted. I didn’t know there’s EXCEPT clause on PostgreSQL SELECT statement.
for some reason this article has just appeared in the planetpostgres twitter feed… in Postgresql there is also the option of http://pgdba.net/pg51g … yes, shameless self-promotion this is, but anyone interested in your article may also be interested in pg51g. cheers
Easiest way is to use a free database table differencing tool such as CompareData
Using already packaged/developed tool you can also compare the data where the two tables live on different servers, even DBMSs, since in that case you will not be able to do a join.