Quantcast
Channel: mediaczar » hack
Viewing all articles
Browse latest Browse all 6

A first stab at a perl script to create Twitter friend/follow matrices

$
0
0

Geek alert: if the title of this post isn’t a dead giveaway I should tell you — unless you’re interested in APIs and badly-put-together bits of code — this probably isn’t for you.

I’ve recently found myself using a service provided by Damon Clinkscale called DoesFollow. All it does is answer the simple question “does twitter user A follow twitter user B?” Apart from a frill which lets you reverse the order of your question (“does twitter user B follow twitter user A?”) that’s all it does. You can even interrogate it from the address bar like this: http://doesfollow.com/barackobama/mediaczar

doesfollow

While I was thinking about how useful a service this is, I was suddenly struck by a moment of clarity. A lot of the research I’ve been doing could be simplified by something like this.

Quite often I want to find out whether MPs or congressmen or PR people follow each other on Twitter.

The way that I’ve been doing this until now is

  1. make a list of the people who I’m interested in researching
  2. for each person on that list, grab the list of all the Twitter people whom they follow
  3. process the list so that only relationships between the people on the list show up

If all I’m doing is checking to see who follows whom, then this is a horribly wasteful way of doing things. The Twitter API limits the number of calls one can make on it — so this wastage leads to things taking much longer.

If only I could cycle all the names I want to check through something like DoesFollow!

Well – it turns out that I can. And in theory it’s not much harder than using DoesFollow. The Twitter API (which is what DoesFollow uses, after all) has a method called friendship/exists. All we have to do is send Twitter the following request:

http://twitter.com/friendships/exists.xml?user_a=barackobama&user_b=mediaczar

and it will come back with the answer:

<friends>true</friends>
or
<friends>false</friends>

Kludge-y perl code

poor-man-hot-water-heater

(This fabulous picture courtesy of There, I Fixed It)

So I tried to do this using Yahoo! Pipes, but there are too many nested loops. You need to do something like this:


get list of names

for each user_a (in list) {

    for each user_b (in list) {
      does friendship exist

    }

}

There’s no easy way to get Pipes to do this, as far as I can see (I’ll keep trying, but if someone else can help, I’d be v. grateful.)

So I’ve pulled together a badly-written perl script to do the work for me.

The script

[code lang="perl"]
#!/usr/bin/perl
# checks the Twitter API to find the friendships between a list of usernames
# this should really use the NEW API call that would let us halve the number
# of API calls
# author: Mat Morrison
# date: Friday July 10, 2009
use warnings;
use LWP::Simple;
# set up variables
# we're just using a whitespace delimited list for the moment
my @usernames = qw(kerrymg mediaczar timhoang titusbicknell);
# let's build the matrix with a hash of hashes...
# to begin with, we'll include diagonal values -
# that is -- we'll check to see whether @mediaczar follows @mediaczar
foreach $user_a(@usernames) {
foreach $user_b(@usernames) {
# we should put in a conditional clause that will check for the diagonal values
# and not bother checking whether someone is a friend of themselves...
$url = 'http://twitter.com/friendships/exists.xml?user_a='
.$user_a
.'&user_b='
.$user_b;
# get XML file from Twitter -- it's an astonishingly simple XML file that reads
# true
# or
# false
# so we don't need to do much with it...
$follows = get $url;
die 'Can\'t get $url' unless defined $follows;
# strip the tags - I'm using a generic "HTML stripping" regex
$follows =~ s/<(.|\n)+?>//g;
# we should probably convert "true" values to 1 and "false" values to zero or blank
# now let's push data into the matrix
$matrix{$user_a}{$user_b} = $follows
}
}
# spit out the data as a tab-delimited table
# print the top line first
for $user_b ( keys %matrix ) {
print "\t$user_b";
}
# now print the values
# they're all neatly arranged in the matrix so we
# can just print them out sequentially
for $user_a ( keys %matrix ) {
print "\n$source";
for $follows ( keys %{ $matrix{$user_a} } ) {
print "\t$matrix{$user_a}{$follows} ";
}
}
print "\n";
[/code]

Where next?

Most of my thinking is included above in the code comments. An obvious mistake I’m making is checking to see whether, say, @mediaczar follows @mediaczar. That wastes n API calls per search. But a more serious mistake is not to be using the new friendships/show method. Because it tells you whether user A follows user B and whether user B follows user A at the same time, it would save me lots of API calls. How many lots? Well take a look at this.

This is what I’m doing at the moment — checking each and every cell in the matrix:

clumsy API call matrix

This is what I’d be doing if I removed the diagonals:

Matrix with diagonals removed

And this is what I’d be doing if I used the newer API call:

Matrix using the new API call

I had to look up the formula for working this out without colouring in little boxes. With a little tweaking (to prevent the diagonals from creeping back in), here it is:

((n-1)^2)+n-1)/2

So — for a list of congress people (159 on twitter as at Tuesday July 14, 2009) that’d be ((156-1)^2-1+156)/2 = 12,090 API calls. Which is still a lot and will require some careful throttling, but (literally) not half as many as the 156^2 = 24,336 API calls that I’d need to run it as the script currently stands.

So – back to the drawing board for a while. I really can’t work out a programmatic way of doing this. Hmph.


Viewing all articles
Browse latest Browse all 6

Trending Articles