This article is an extension of my previous article Computing degrees of separation in social networking. This time, however, I will write the algorithm completely in PostgreSQL’s PL/pgSQL. For background information and to understand the premise of this post, please glance through my previous post.
Ok now that you’re familiar with what we’re working towards, here’s my assumption of your database structure:
- You have a table that holds users information. Let’s call this table user
- For defining relationship, you have a cross table with user_id and friend_id
- Your database is PostgreSQL database (duh)
Now, based on the assumption above, here’s the graphical view of the schema
Here’s the table structure definition I used for the purpose of this tutorial. You can add foreign key constraint to it. However, for the purpose of creating this tutorial, I only have relationship table.
CREATE TABLE relationship ( user_id integer NOT NULL, friend_id integer NOT NULL, weight integer, CONSTRAINT friendship_pk PRIMARY KEY (user_id, friend_id) ) WITHOUT OIDS ;
What we are working towards
The idea is to define a PosgreSQL function where we’ll be able to execute it and have the relationship table resembles the result matrix similar to the example problem on Computing degrees of separation in social networking post like below. So let’s get going.
Defining Stored Procedure Language
By default PostgreSQL’s database only has C and SQL as languages you can use inside stored procedure. I’m not ready to fire up programming in C and I think SQL won’t be sufficient for my purpose. What I need is to use PL/pgSQL as the language of choice for this time. We need to add this language to our PostgreSQL database. If you haven’t added PL/pgSQL as a languange in your database, execute the following SQL statement to do so.
CREATE TRUSTED PROCEDURAL LANGUAGE 'plpgsql' HANDLER plpgsql_call_handler VALIDATOR plpgsql_validator;
Populating relationship table with our example data
At this point, I’ll just worry about relationship table. My purpose is to have the relationship table well defined so that I can tell what the degree of relationship between any users on the system is. Note, I’ve assigned user A as having user_id 0 going on sequentially until user E having user_id 4.
INSERT INTO relationship VALUES (0, 3, 1), (0, 2, 1), (1, 3, 1), (1, 4, 1), (1, 2, 1), (2, 4, 1), (2, 3, 1) ;
As you can see, our initial data doesn’t fully define relationship both ways; e.g. relationship between user 0 and 3 is only defined one way 0 -> 3 with weight 1. In a friend relationship, that implies 3 -> 1 with weight 1 as well. So one of the first thing we need to do is to “complete” the relationship table so that it defines relationship both ways.