Databases and data-mining: a favoured method for security and law enforcement agencies

Databases and data-mining: a favoured method for security and law enforcement agencies

Chris Jones Statewatch

Image from The Guardian

The documents released by whistleblower Edward Snowden on the US National Security Agency (NSA) and the UK’s Government Communications Headquarters (GCHQ) have revealed the vast extent of state surveillance undertaken in secret in the name of “national security”. However, absorbing vast amounts of digital information is not solely the preserve of secretive security agencies. Increasingly, police forces are taking a similar approach. However, organisastion and campaigning have the potential to slow and possibly halt such developments.

A potted history of the NSA

In 1946, following their cooperation on signals intelligence in World War II the UK and USA signed an agreement – known as the UKUSA agreement – which permits “the exchange of the products of the following operations relating to foreign communications”:

(1) collection of traffic

(2) acquisition of communication documents and equipment

(3) traffic analysis

(4) cryptanalysis

(5) decryption and translation

(6) acquisition of information regarding communication organisations, practices, procedures and equipment.

More specifically, the agreement was between the US Army-Navy Communication Intelligence Board (“representing the U.S. State, Navy and War Departments and all other U.S. Communication Intelligence authorities which may function”) and the London Signal Intelligence (SIGINT) Board (“representing the Foreign Office, Admiralty, War Office, Air Ministry, and all other British Empire Communication Intelligence authorities which may function”).[1]

Canada joined the Agreement in 1948, and Australia and New Zealand in 1956. Norway (1952), Denmark (1954), Germany (1955), Italy, Turkey, the Philippines and Ireland are also party to the agreement.

Together the UK, USA, Canada, Australia and New Zealand make up the ‘Five Eyes’ states. They share intelligence and information on a variety of issues, and from a whole host of sources – not just SIGINT – including human intelligence, defence intelligence, and security intelligence.[2] There are a number of other ‘Eyes’ groups with varying countries involved.[3]

During the Cold War the primary interest of these agencies was communism and the activities of the Soviet Union, its satellite states, and their agents. Now they appear chiefly focused upon terrorism, industrial espionage, and cyber-security (for example with a focus on hackers and cyber-attacks).


In 1988 the journalist Duncan Campbell revealed that Australia, Canada, New Zealand, the UK and the USA were operating a global telecommunications interception system, which became known popularly as ECHELON.[4] Agencies such as the UK’s Government Communications Headquarters (GCHQ) and the US’ National Security Agency (NSA) played a key role in this network.

A decade later the issue began to receive political attention in Europe, most notably with the publication of three European Parliament reports, one of which concluded, amongst other things, that Member States involved with the system were probably in breach of Article 8 of the European Convention on Human Rights.

Despite the furore, nothing happened. Rather than taking action to rein in the NSA’s capabilities, US politicians apparently gave the agency the green light to expand. According to NSA whistleblower Adrienne J. Kinne, after 9/11 “basically all the rules were thrown out the window” and extensive surveillance of US citizens as well as foreign nationals increased massively, in the name of combating terrorism.[5]

One of the EP’s reports noted of ECHELON that:

“In areas characterised by a high volume of communications only a very small proportion of those communications are transmitted by satellite… this means that the majority of communications cannot be intercepted by earth stations, but only by tapping cables and intercepting radio signals, something which – as the investigations carried out in connection with the report have shown – is possible only to a limited extent… the UKUSA states have access to only a very limited proportion of cable and radio communications and can analyse an even more limited proportion of those communications… the extremely high volume of traffic makes exhaustive, detailed monitoring of all communications impossible in practice.”[6]

Changing strategies

The US government was obviously well aware of these problems. In 2001 the Wall Street Journal reported that “the NSA’s snooping capabilities are in jeopardy, undermined by advances in telecommunications technology.”[7]  According to Thomas Drake, another NSA whistleblower, in 2003 the government intervened in the sale of a US undersea cable company to an Asian firm, Global Crossing. A ‘Network Security Agreement’ was signed between the government and the company, which was obliged to maintain an internal corporate cell of American citizens with government clearances responsible for ensuring that surveillance requests for fulfilled quickly and The vast global increases in digital telecommunications and transactions led the US Department of Defence (of which the NSA is a part) to state in 2007 that the Pentagon aimed to expand its systems to be able to handle yottabytes of data (yottabye=a septillion bytes, 1024; gigabyte=about a billion bytes). A March 2012 article by James Bamford[9] in Wired magazine makes clear the expansion of the NSA’s operations to deal with the digital age. It examined the NSA’s new data centre in Utah, which was intended to gather, store and analyse:

“[T]he complete contents of private emails, cell phone calls, and Google searches as well as all sorts of personal data trails – parking receipts, travel itineraries, bookstore purchases, and other digital ‘pocket litter’.”[10]

As an aside, the centre will have a significant environmental impact: according to Bamford it will use 65-megawatts of electricity per year and its water system will have the ability to pump 1.7 million gallons of liquid per day. It also has its own sewage system, and a massive air conditioning system to keep computers cool.

The Snowden revelations

Then, in June 2013, investigations based on documents obtained by Edward Snowden confirmed the involvement of Global Crossing and other firms in the worldwide surveillance operation and the extent to which the NSA and its allies, such as GCHQ, are able to monitor personal communications.

Agreements with companies such as Global Crossing and BT, as part of the FAIRVIEW program, allow the NSA to extract internet traffic as it travels across the globe; the agency extracts information directly from the servers of major US internet corporations such as Google, Facebook and Yahoo; gathers information on hundreds of millions of text messages and phone calls every day; and has broken or circumvented digital encryption standards.

GCHQ, meanwhile, which over the last three years has received £100 million from the NSA for its efforts, also engages in cable tapping, monitors phone calls and text messages, and so forth. It provides information and analysis to MI5, MI6, the government, and also the NSA, who apparently consider the comparatively weaker regulation of its British counterpart as a “selling point”.[11]

The haystack

Broadly speaking, it seems that the NSA and its partner agencies are aiming to collect, if not everything, then as much digital information as they possibly can. People generate this information through increasingly-ubiquitous devices such as smartphones, laptops, etc.; the advances that have led to these devices allow state agencies to store and process the information.

The result is an enormous haystack which is apparently used to sift out the various needles: terrorists, hackers, spies, companies breaking sanctions and trade embargoes, etc. The NSA, GCHQ and their supporters claim that the agency is incredibly judicious with this information, but significant lack of transparent oversight arrangements means it is not clear that this is the case. In any case, it does not justify collecting it all in the first place.

Nevertheless, the “haystack” approach – the collection and analysis of vast sets of digitally-stored data – has become, or is becoming, increasingly adopted in law enforcement.


There are numerous examples that could be used, but a particularly useful one, due to its scale and nature, is Passenger Name Record (PNR) information. PNR is generated by travel agencies and transport operators when individuals book travel tickets. For airlines, which so far have been the key focus state PNR collection and analysis efforts, the PNR data is made up of:

  • PNR record locator code;
  • Date of reservation/issue of ticket;
  • Date(s) of intended travel;
  • Name(s);
  • Available frequent flier and benefit information (i.e., free tickets, upgrades, etc.);
  • Other names on PNR, including number of travellers on PNR;
  • All available contact information (including originator information);
  • All available payment/billing information (not  including other transaction details linked to a credit card or account and not connected to the travel transaction);
  • Travel itinerary for specific PNR;
  • Travel agency/travel agent;
  • Code share information;
  • Split/divided information;
  • Travel status of passenger (including confirmations and check-in status);
  • Ticketing information, including ticket number, one way tickets, and Automated Ticket Fare Quote;
  • All baggage information;
  • Seat information, including seat number;
  • General remarks including OSI (Other Service Information), SSI (Special Service Information) and SSR (Special Service Request) information, e.g. wheelchair requirements, seating preferences, special meal requests, etc.;
  • Any collected APIS information;
  • All historical changes to the PNR listed in numbers 1 to 18.

All this information is collected by commercial companies for their own use. Law enforcement agencies, however, decided that it may be useful to them.

In 2004 the US Department of Homeland Security (DHS) and the EU signed an agreement on airline PNR which has been renewed a number of times, most recently in 2012. The agreement requires the PNR data of all airline passengers travelling from the EU to be taken from airlines’ and travel agencies’ computer reservation databases and sent to the DHS for processing and examination for involvement in “terrorism and related crimes” and “other crimes that are punishable by a sentence of imprisonment of three years or more and that are transnational in nature.”[12]

It is worth noting that although the NSA apparently collects “travel itineraries” (see Bamford quote, above), it presumably does not share them with the DHS. If it did, there would have been no need to conclude a separate agreement with the EU to ensure that airlines and travel agencies hand over PNR information.

Currently there is EU legislation on the table that if passed would extend the system to flights coming into the EU – and possibly plane, train and boat travel within the EU – despite the fact that no evidence has been presented to show the value of PNR in dealing with terrorism and crime. Justifications from law enforcement agencies that have been made public have relied on anecdotes. The same problem surrounds the EU’s controversial Data Retention Directive, which mandates the collection by telecommunications service providers of metadata on telephone calls and internet use for up to two years, for law enforcement purposes.[13]


The tendency towards the collection and collation of large sets of data, personal or otherwise, and the application of new modes of analysis to already existing datasets, appears endemic: the introduction of “predictive policing” systems,[14] the attempt to give law enforcement agencies powers held by GCHQ through the “Snooper’s Charter”,[15] the EU’s Data Retention Directive, providing law enforcement authorities with access to the European database of asylum seekers’ fingerprints and the European visa database,[16] not to mention use of data-mining and analysis tools by private companies…

These “solutions” are frequently marketed to governments by multinational IT and security corporations as “solutions” to the problems of terrorism, crime and “migration management”. The fact that states and state agencies frequently purchase such systems shows the extent to which they see large-scale collection and processing of data to be a valid and effective way of dealing with security problems. Given the amount of personal data that is routinely generated, collected and stored by private companies and non-law-enforcement government departments, there are clearly vast possibilities for more extensive surveillance.

However, such developments are by no means inevitable. Campaigning prevented the Snooper’s Charter becoming law, legal challenges from NGOs may well see the Data Retention Directive struck down,[17] and the negotiations on an EU PNR system have for some time now been stalled due to civil liberties concerns.[18] Ensuring transparency and basic accountability of security and intelligence agencies (as called for by campaigns such as Don’t Spy On Us in the UK[19] and Stop Watching Us in the US[20]) is the first step towards ensuring that these bodies are compelled to act within a framework based on human rights and the rule of law. There’s always lots of bad news, but there are plenty of reasons for optimism too.


[9] Author of a number of books on the NSA: The Puzzle Palace (1982 and 2001); Body of Secrets (2002), The Shadow Factory (2008).


Posts on this blog represent the views of their authors, not of Breaking the Frame, unless otherwise noted.

This entry was posted in Blog and tagged , , , , , , , , , , , . Bookmark the permalink.

One Response to Databases and data-mining: a favoured method for security and law enforcement agencies

  1. Good site you have here.. It’s hard to find quality
    writing like yours these days. I really appreciate
    people like you! Take care!!

Leave a Reply

Your email address will not be published. Required fields are marked *