Geek alert: if the title of this post isn’t a dead giveaway I should tell you — unless you’re interested in APIs and badly-put-together bits of code — this probably isn’t for you.
I’ve recently found myself using a service provided by Damon Clinkscale called DoesFollow. All it does is answer the simple question “does twitter user A follow twitter user B?” Apart from a frill which lets you reverse the order of your question (“does twitter user B follow twitter user A?”) that’s all it does. You can even interrogate it from the address bar like this: http://doesfollow.com/barackobama/mediaczar
While I was thinking about how useful a service this is, I was suddenly struck by a moment of clarity. A lot of the research I’ve been doing could be simplified by something like this.
Quite often I want to find out whether MPs or congressmen or PR people follow each other on Twitter.
The way that I’ve been doing this until now is
- make a list of the people who I’m interested in researching
- for each person on that list, grab the list of all the Twitter people whom they follow
- process the list so that only relationships between the people on the list show up
If all I’m doing is checking to see who follows whom, then this is a horribly wasteful way of doing things. The Twitter API limits the number of calls one can make on it — so this wastage leads to things taking much longer.
If only I could cycle all the names I want to check through something like DoesFollow!
Well – it turns out that I can. And in theory it’s not much harder than using DoesFollow. The Twitter API (which is what DoesFollow uses, after all) has a method called friendship/exists
. All we have to do is send Twitter the following request:
http://twitter.com/friendships/exists.xml?user_a=barackobama&user_b=mediaczar
and it will come back with the answer:
<friends>true</friends>
or
<friends>false</friends>
Kludge-y perl code
(This fabulous picture courtesy of There, I Fixed It)
So I tried to do this using Yahoo! Pipes, but there are too many nested loops. You need to do something like this:
get list of names
for each user_a (in list) {
-
for each user_b (in list) {
- does friendship exist
}
}
There’s no easy way to get Pipes to do this, as far as I can see (I’ll keep trying, but if someone else can help, I’d be v. grateful.)
So I’ve pulled together a badly-written perl script to do the work for me.
The script
[code lang="perl"]
#!/usr/bin/perl
# checks the Twitter API to find the friendships between a list of usernames
# this should really use the NEW API call that would let us halve the number
# of API calls
# author: Mat Morrison
# date: Friday July 10, 2009
use warnings;
use LWP::Simple;
# set up variables
# we're just using a whitespace delimited list for the moment
my @usernames = qw(kerrymg mediaczar timhoang titusbicknell);
# let's build the matrix with a hash of hashes...
# to begin with, we'll include diagonal values -
# that is -- we'll check to see whether @mediaczar follows @mediaczar
foreach $user_a(@usernames) {
foreach $user_b(@usernames) {
# we should put in a conditional clause that will check for the diagonal values
# and not bother checking whether someone is a friend of themselves...
$url = 'http://twitter.com/friendships/exists.xml?user_a='
.$user_a
.'&user_b='
.$user_b;
# get XML file from Twitter -- it's an astonishingly simple XML file that reads
#
# or
#
# so we don't need to do much with it...
$follows = get $url;
die 'Can\'t get $url' unless defined $follows;
# strip the tags - I'm using a generic "HTML stripping" regex
$follows =~ s/<(.|\n)+?>//g;
# we should probably convert "true" values to 1 and "false" values to zero or blank
# now let's push data into the matrix
$matrix{$user_a}{$user_b} = $follows
}
}
# spit out the data as a tab-delimited table
# print the top line first
for $user_b ( keys %matrix ) {
print "\t$user_b";
}
# now print the values
# they're all neatly arranged in the matrix so we
# can just print them out sequentially
for $user_a ( keys %matrix ) {
print "\n$source";
for $follows ( keys %{ $matrix{$user_a} } ) {
print "\t$matrix{$user_a}{$follows} ";
}
}
print "\n";
[/code]
Where next?
Most of my thinking is included above in the code comments. An obvious mistake I’m making is checking to see whether, say, @mediaczar follows @mediaczar. That wastes n API calls per search. But a more serious mistake is not to be using the new friendships/show
method. Because it tells you whether user A follows user B and whether user B follows user A at the same time, it would save me lots of API calls. How many lots? Well take a look at this.
This is what I’m doing at the moment — checking each and every cell in the matrix:
This is what I’d be doing if I removed the diagonals:
And this is what I’d be doing if I used the newer API call:
I had to look up the formula for working this out without colouring in little boxes. With a little tweaking (to prevent the diagonals from creeping back in), here it is:
((n-1)^2)+n-1)/2
So — for a list of congress people (159 on twitter as at Tuesday July 14, 2009) that’d be ((156-1)^2-1+156)/2 = 12,090
API calls. Which is still a lot and will require some careful throttling, but (literally) not half as many as the 156^2 = 24,336 API calls that I’d need to run it as the script currently stands.
So – back to the drawing board for a while. I really can’t work out a programmatic way of doing this. Hmph.