Communications Traffic Data and you

Private firm may track all email and calls, reports The Guardian.  It’s a pretty remarkable story, and on the face of it it’s enough to get lots of people very upset already.

I think it’s a lot more insidious than it seems too.  The trick is in the term “Communications Traffic Data”, and what this really means in practice.

As with phone calls now, the government claims it will only record “Communications Traffic Data” and not the content of internet traffic. It’s still quite Big Brother, of course:

By building up a database about our movements - our morning rituals of
checking emails, visiting web sites, buying online - this will build up
a pattern. This in itself is “content”. This will create a pattern of
recognition about our movements. Plus how long would it be before they
start to argue that they need to see the content as well? Curiously,
because so few people in China - relatively speaking - are online
and/or using credit cards, China will look pretty free compared to our
electronically driven society.

It’s a lot worse than this I’m afraid. 

How do we decide what Communications Traffic Data is?  All traffic over the internet is transmitted in packets (called Datagrams) according to the Internet Protocol (IP).  These datagrams have a header, and a body.  The header contains the IP Communications Traffic Data, and the body contains the content.  The IP header contains, amongst other things, the source IP address and destination IP address of the datagram.

So is this what the government counts as Communications Traffic Data?  Well, not quite.  the IP addresses are part of what they want to record, but not everything.  How do we know what type of traffic this is?  Surely whether this is part of an email or part of a web page is Communications Traffic Data too?  Also, who initiated this conversation?  This could be a web request, or a page in response, and which way the traffic is going is important too isn’t it?

Well, that information isn’t in the IP header.  It’s inside the content of the IP packet.  For Web and Email, it’ll be inside a Transmission Control Protocol (TCP) packet, carried within IP.

So, we look at the IP packet, see it’s a TCP packet, unpack the content and look at the TCP header.  The TCP header is the Communications Traffic Data for TCP, and the body is the data.  In the header for the TCP packet we have the information we need to see what port numbers this communication is between, and we know that port 80 is normally web traffic, so now we know it’s a web page.

We also know who started the conversation, so we can keep track of who is asking whom for what.

But, well, what URL is being requested?  Is this Communications Traffic Data? It’s not as far as TCP is concerned, but it is for the Hypertext Transfer Protocol (HTTP).  And surely anyone reasonable would say it’s Communications Data as far as the government is concerned?

So, we unpack the tcp packet, find the HTTP request and look at that.  Now then, there’s not much left of this packet now is there?  For an HTTP request in fact, we can reasonably claim that the entire packet is Communications Traffic Data.

As I understand both existing and planned legislation, there’s no strict definition of what “Commmunications Traffic Data” really is, and the possible database could well end up storing all of these data.

HTTP requests aren’t such a contrived example either.  But try some more on for size.  Imagine an MSN chat conversation.  The IP packets just record that your computer talks to an MSN server somewhere, and that’s it.  That’s a lot less than telephone communications data, which at least records the virtual circuit endpoints (i.e. phone numbers).  So, is it reasonable to unpack all these packets to find the usernames for who is communicating?  Probably.  Again, very little of the actual data is considered to be “content”, and almost all of it is “Communication Data”.

Imagine you are playing an MMO such as World of Warcraft, and you start a private chat with someone else.  Is this Communications Traffic Data?  How about your emotes to someone? 

What about email attachments?  Their size, filenames and types are part of the MIME protocol, within an email, and these could be “Communications Data”.  The actual contents of the attachment would be hard to justify as “Communications Data”, but that’s about it.

As should be clear by now, the Internet is built as layers within layers within layers.  Every layer considers it’s containing stuff to be “just data”.  It’s the most powerful abstraction we have, and without it we would never have been able to build the Internet.

But the Internet was never designed to facilitate state monitoring and control of all communications, and it doesn’t have the ready control knobs that an authoritarian government would have required.  It’s also not easily possible to retrofit them, which is what this government really requires.  They use loose language in statute to allow them to adapt to a technology that they don’t really appreciate.  Because the media don’t appreciate the technology either, we’re likely to find a government with strongly authoritarian instincts being granted vast powers entirely by accident.

That would be tragic.

1 Responses to “Communications Traffic Data and you”


  1. 1 Who owns my network? « Station To Station
Comments are currently closed.