Bookmark and Share

Comparing data from 2 database tables

Posted: Sunday, March 1st, 2009 at 2:41 amUpdated: Monday, May 25th, 2009 at 5:18 pm

PostgreSQL Code for Comparing 2 Tables

If you go back to page 1 of this article, the example SQL statements there should work with PostgreSQL database. I won’t repeat myself (go back to page 1 if you need to). Rather, I’ll dive into the stored procedure definition of tablediff function for PostgreSQL.

-- Written by Maresa Nirwan
-- www.microshell.com

CREATE OR REPLACE FUNCTION tablediff (
	IN schema_name VARCHAR(64),
	IN table1 VARCHAR(64),
	IN table2 VARCHAR(64)
) RETURNS BIGINT AS
$BODY$
DECLARE
	the_result BIGINT DEFAULT 0;
	sql_statement TEXT DEFAULT '';
	sql_statement_where TEXT DEFAULT '';
	sql_statement_pk TEXT DEFAULT '';
	col_name VARCHAR(64);
	col_name_cur CURSOR FOR 
		SELECT
			column_name
		FROM
			information_schema.columns
		WHERE
			table_catalog = schema_name
			AND table_schema = 'public'
			AND table_name = table1
	;
BEGIN
	OPEN col_name_cur;

	LOOP
		FETCH col_name_cur INTO col_name;
		IF NOT FOUND THEN
			EXIT;
		END IF;

		sql_statement_where := sql_statement_where || ' AND a.' || col_name || ' = b.' || col_name;
	END LOOP;

	SELECT
		column_name INTO col_name
	FROM
		information_schema.table_constraints tc
		JOIN information_schema.constraint_column_usage ccu ON
			ccu.constraint_name = tc.constraint_name
	WHERE
		tc.table_catalog = schema_name
		AND tc.table_schema = 'public'
		AND tc.table_name = table1
	LIMIT 1
	;

	IF col_name IS NOT NULL THEN
		sql_statement_pk := ' a.' || col_name || ' IS NULL';
		sql_statement_pk := sql_statement_pk || ' OR b.' || col_name || ' IS NULL';
	END IF;

	sql_statement := 'SELECT COUNT(*) FROM ' || schema_name || '.public.' || table1 || ' a FULL OUTER JOIN ' || schema_name || '.public.' || table2 || ' b ON TRUE';
	sql_statement := sql_statement || sql_statement_where || ' WHERE ' || sql_statement_pk;

	EXECUTE sql_statement INTO the_result;
	
	RETURN the_result;
END;$BODY$
LANGUAGE 'plpgsql' STABLE;

Explaining FULL OUTER JOIN version of tablediff stored procedure

Line 8: There’s one bummer in PostgreSQL that I could not get around yet. Unlike MySQL, PostgreSQL’s function can return a set of records without knowing the specifics about the records. If I change this line to RETURNS SETOF RECORD and instead of count(*) I loop through the result and RETURN NEXT, I would have to call the function by specifying the columns. This defeats the whole purpose of having a simple function call to quickly find out if the two tables are the same. So I had to settle on having it returns the number of rows that’s not the same. That means, if the function returns 0, the tables are the same.

-- As is, I would call the function as follows:
SELECT tablediff('testing', 'jajal', 'jajal_copy');

-- If I insist on returning SETOF RECORD, I would have to call
-- the function as follows:
SELECT * FROM tablediff('testing', 'jajal', 'jajal_copy')
AS (
  afirst_name character varying(255),
  alast_name character varying(255),
  agrade character(1),
  auser_id integer,
  bfirst_name character varying(255),
  blast_name character varying(255),
  bgrade character(1),
  buser_id integer
);

The rest of the function definition, I think, is pretty much have similar logic to MySQL version (see page 2). The only difference is the generation of dynamic SQL to be executed. Since PostgreSQL supports FULL OUTER JOIN, we have a simpler SQL statement to be constructed.

There you have it … I hope you enjoyed this article. Please leave comments / suggestions / questions if you have. I’m looking forward to improving my solution with your comments / suggestions / questions.

Pages: 1 2 3

6 Responses to “Comparing data from 2 database tables”

  1. Fernando Vezzosi Says:

    Maatkit has mk-table-sync that can be used for diffing tables:

    http://www.maatkit.org/doc/mk-table-sync.html

  2. Ted Says:

    (SELECT * FROM table1 EXCEPT SELECT * FROM table2)
    UNION
    (SELECT * FROM table2 EXCEPT SELECT * FROM table1)

  3. Maresa Says:

    Thank you Ted. I didn’t know there’s EXCEPT clause on PostgreSQL SELECT statement.

  4. Michael Says:

    for some reason this article has just appeared in the planetpostgres twitter feed… in Postgresql there is also the option of http://pgdba.net/pg51g … yes, shameless self-promotion this is, but anyone interested in your article may also be interested in pg51g. cheers

  5. Farid Z Says:

    Easiest way is to use a free database table differencing tool such as CompareData

    Using already packaged/developed tool you can also compare the data where the two tables live on different servers, even DBMSs, since in that case you will not be able to do a join.

  6. manoj Says:

    minus or except operators are better than this

    select * from jajal
    minus
    select * from jajal_copy –works in oracle

Leave a Reply

Time limit is exhausted. Please reload the CAPTCHA.