Bookmark and Share

Computing degrees of separation in social networking.

Posted: Friday, January 9th, 2009 at 6:31 amUpdated: Friday, January 30th, 2009 at 10:08 pm

Dijkstra’s Shortest Path Algorithm.

If you remember from your computer science class, there’s this algorithm called Dijkstra. Check out this wikipedia link for more information about it. In short, the algorithm solves the shortest path for every point. You can think of user 0 to 99 each as a point. Then you need to calculate the path from user 0 to 1 all the way to 99 using the shortest path. Going back to our example of 5 users above (user A to E), the path for A – E is via C. So the path is A – C – E for path value 2. In computer science, the users (A to E) is called vertex. The path among the users (vertex) is called graph. So in computer science speak, Dijkstra’s algorithm is to calculate the shortest path among vertex.

Ok enough with the introduction of Dijkstra’s algorithm. How does it work? (Copied from Wikipedia page linked above)

  1. Create a distance list, a previous vertex list, a visited list, and a current vertex.
  2. All the values in the distance list are set to infinity except the starting vertex which is set to zero.
  3. All values in visited list are set to false.
  4. All values in the previous vertex list are set to a special value signifying that they are undefined, such as null.
  5. Current vertex is set as the starting vertex.
  6. Mark the current vertex as visited.
  7. Update distance (from starting vertex) and previous lists based on those vertices which can be immediately reached from the current vertex.
  8. Update the current vertex to the unvisited vertex that can be reached by the shortest path from the starting vertex.
  9. Repeat (from step 6) until all nodes are visited.

Floyd–Warshall Algorithm.

Dijkstra’s algorithm is not the only algorithm to calculate shortest path. There are other algorithms that can also be used for our problem. One of them is Floyd–Warshall algorithm. Here’s the wikipedia link for the algorithm. Basically, if you do the algorithm, you’ll come up with a table (matrix) like the result table (matrix) used to solve our example of 5 users. To save you from clicking back (and saving me bandwith of you reloading my article ;-) ) I’m pasting the table again below.

  A B C D E
A 0 2 1 1 2
B 2 0 1 1 1
C 1 1 0 1 1
D 1 1 1 0 2
E 2 1 1 2 0

So I choose using Floyd-Warshall algorithm as it is faster and easier to code, I think. Below is my solution for the problem on the previous page written in PHP. Note for the data definition. You need to define both directions of the relationship. For example, if you define relationship for user 0 – 70, define relationship for user 70 – 0 as well.

<?php
// Written by Maresa Nirwan
// http://www.microshell.com

// Include the data definition. It's basically defining 2 dimensional arrays with
// the array indexes represents the users. For example, relationship for 0 - 70 is
// $relationship[0][70] = 1;
// $relationship[70][0] = 1;
include_once('data_definition.php');

/**
  * This function is based on Floyd-Warshall algorithm for solving all users
  * connections with each other. The function expects 2 global variables named
  * - (int) $max for number of people in the network.
  * - (Array) $relationship as a 2 dimensional array representing the matrix of
  *   relationship between users. 1 means there is direct relationship
  *   not defined means there's no direct relationship.
  *
  * The end result of calling this function is that the global variable $relationship
  * will be fully filled with values of 2nd degree or more friends relationship.
  *
  * Possible improvements:
  * 1. Since friends are non-directional, the matrix will always be a symmetric one.
  *    Thus, we only need to calculate half of the matrix and therefore, reducing the
  *    iteration by 1/2.
  *
  * References used for Floyd-Warshall algorithm was from the following websites
  * 1. http://en.wikipedia.org/wiki/Floyd-Warshall_algorithm
  * 2. http://www.pms.ifi.lmu.de/lehre/compgeometry/Gosper/shortest_path/shortest_path.html#visualization
  *
  */
function getDegreeFloyd() {
   global $relationship;
   global $max;

   $k = 0;

   for ($i = 0; $i < $max; $i++) {
      for ($j = 0; $j < $max; $j++) {
         for ($k = 0; $k < $max; $k++) {
            // If the relationship between 2 users are not defined, then the
            // shortest path should be set to infinity (or a very large number)
            if (!isset($relationship[$j][$k])) $relationship[$j][$k] = 999999;
            if (!isset($relationship[$j][$i])) $relationship[$j][$i] = 999999;
            if (!isset($relationship[$i][$k])) $relationship[$i][$k] = 999999;

            $relationship[$j][$k] = min($relationship[$j][$k], $relationship[$j][$i] + $relationship[$i][$k]);
         }
      }
   }
}

// Start of the program to call the function (Driver code some people may call it)
$cnt = 0;
$max = 100;
$total = 0;

// the following 4 lines is to get the script run time. It can be ommited safely. I just
// put it here for my own curiousity as to how fast the script is.
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$tstart = $mtime;

getDegreeFloyd();

// By now, the array $relationship should be completed.
// To calculate the average number of degrees of separation,
// just sum the top half values of the matrix then divides
// by the number of entry. Remember the example result table
// of 5 users above.
for ($i = 0; $i < $max; $i++) {
   for ($j = $i + 1; $j < $max; $j++) {
      $tmp = $relationship[$i][$j];
      $total += $tmp;
      $cnt++;
   }
}

// Continuation of the time calculation to get end time.
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$tend = $mtime;
$rtime = ($tend - $tstart);

echo "Total is $total. Count is $cntn";
echo "The average degree is " . ($total / $cnt) . "n";
echo "Script running time: $rtime.n";
?>

Running it, I got the following result:

mnirwan@dev ~ $ ./degree.php
Total is 12195. Count is 4950
The average degree is 2.46363636364
Script running time: 2.62368488312.
mnirwan@dev ~ $

There you have it … I’ve also written a version that runs purely on PostgreSQL database. Please see the plpgsql version here. I hope this article helps you. As always, I welcome comments / questions / critics that will help me and other readers understand better.

Pages: 1 2

3 Responses to “Computing degrees of separation in social networking.”

  1. james Says:

    hi
    i want to know which platform will handle larger and quicker searches (PHP or SQL)?

    also do u have implementations of BFS and DFS in POSTGRESQL.

    Great Work
    thanks

  2. deepak Says:

    Degree pf seperation required in c programing code

  3. deepak Says:

    really good

Leave a Reply


1 × five =