PostgreSQL Code for Comparing 2 Tables
If you go back to page 1 of this article, the example SQL statements there should work with PostgreSQL database. I won’t repeat myself (go back to page 1 if you need to). Rather, I’ll dive into the stored procedure definition of tablediff function for PostgreSQL.
-- Written by Maresa Nirwan -- www.microshell.com CREATE OR REPLACE FUNCTION tablediff ( IN schema_name VARCHAR(64), IN table1 VARCHAR(64), IN table2 VARCHAR(64) ) RETURNS BIGINT AS $BODY$ DECLARE the_result BIGINT DEFAULT 0; sql_statement TEXT DEFAULT ''; sql_statement_where TEXT DEFAULT ''; sql_statement_pk TEXT DEFAULT ''; col_name VARCHAR(64); col_name_cur CURSOR FOR SELECT column_name FROM information_schema.columns WHERE table_catalog = schema_name AND table_schema = 'public' AND table_name = table1 ; BEGIN OPEN col_name_cur; LOOP FETCH col_name_cur INTO col_name; IF NOT FOUND THEN EXIT; END IF; sql_statement_where := sql_statement_where || ' AND a.' || col_name || ' = b.' || col_name; END LOOP; SELECT column_name INTO col_name FROM information_schema.table_constraints tc JOIN information_schema.constraint_column_usage ccu ON ccu.constraint_name = tc.constraint_name WHERE tc.table_catalog = schema_name AND tc.table_schema = 'public' AND tc.table_name = table1 LIMIT 1 ; IF col_name IS NOT NULL THEN sql_statement_pk := ' a.' || col_name || ' IS NULL'; sql_statement_pk := sql_statement_pk || ' OR b.' || col_name || ' IS NULL'; END IF; sql_statement := 'SELECT COUNT(*) FROM ' || schema_name || '.public.' || table1 || ' a FULL OUTER JOIN ' || schema_name || '.public.' || table2 || ' b ON TRUE'; sql_statement := sql_statement || sql_statement_where || ' WHERE ' || sql_statement_pk; EXECUTE sql_statement INTO the_result; RETURN the_result; END;$BODY$ LANGUAGE 'plpgsql' STABLE;
Explaining FULL OUTER JOIN version of tablediff stored procedure
Line 8: There’s one bummer in PostgreSQL that I could not get around yet. Unlike MySQL, PostgreSQL’s function can return a set of records without knowing the specifics about the records. If I change this line to RETURNS SETOF RECORD and instead of count(*) I loop through the result and RETURN NEXT, I would have to call the function by specifying the columns. This defeats the whole purpose of having a simple function call to quickly find out if the two tables are the same. So I had to settle on having it returns the number of rows that’s not the same. That means, if the function returns 0, the tables are the same.
-- As is, I would call the function as follows:
SELECT tablediff('testing', 'jajal', 'jajal_copy');
-- If I insist on returning SETOF RECORD, I would have to call
-- the function as follows:
SELECT * FROM tablediff('testing', 'jajal', 'jajal_copy')
AS (
afirst_name character varying(255),
alast_name character varying(255),
agrade character(1),
auser_id integer,
bfirst_name character varying(255),
blast_name character varying(255),
bgrade character(1),
buser_id integer
);
The rest of the function definition, I think, is pretty much have similar logic to MySQL version (see page 2). The only difference is the generation of dynamic SQL to be executed. Since PostgreSQL supports FULL OUTER JOIN, we have a simpler SQL statement to be constructed.
There you have it … I hope you enjoyed this article. Please leave comments / suggestions / questions if you have. I’m looking forward to improving my solution with your comments / suggestions / questions.







Maatkit has mk-table-sync that can be used for diffing tables:
http://www.maatkit.org/doc/mk-table-sync.html
(SELECT * FROM table1 EXCEPT SELECT * FROM table2)
UNION
(SELECT * FROM table2 EXCEPT SELECT * FROM table1)
Thank you Ted. I didn’t know there’s EXCEPT clause on PostgreSQL SELECT statement.
for some reason this article has just appeared in the planetpostgres twitter feed… in Postgresql there is also the option of http://pgdba.net/pg51g … yes, shameless self-promotion this is, but anyone interested in your article may also be interested in pg51g. cheers
Easiest way is to use a free database table differencing tool such as CompareData
Using already packaged/developed tool you can also compare the data where the two tables live on different servers, even DBMSs, since in that case you will not be able to do a join.