Is complete data anonymity possible?

Claire_Vishik · ‎03-27-2015

Claire Vishik is Trust & Security Technology & Policy Director.

According to IBM, 2.5 billion gigabytes of data were generated every day in 2012, a number that must have grown considerably three years later, in 2015. A lot of thought has been dedicated to data protection, data privacy and data anonymity, and there is an increased interest in the problem of anonymity with regard to technical and legal implications of the connected environment today.
Ubiquitous interoperability and connectivity mean that data can flow seamlessly through a variety of applications, networks, and gateways, with each component able to authenticate and forward the request. These activities always leave an electronic trail, with a large amount of machine generated data created during the process. Typically, only user facing components and data attributable to the users and some major protocols (like http) are analyzed for privacy and processed for data minimization when and where required. But machine generated log and protocol data could also contain information potentially identifiable to a device, a network, an application, or a location. The inevitable electronic trail of machine generated data, even if not all of it is recorded and stored, makes complete anonymity very difficult or impossible to achieve.

The concept of anonymity has gained additional importance in relation to the application of the European legislation on personal data protection. This regulation covers the protection of individuals with regard to processing of personal data as well as privacy and electronic communications. The only data which are not “personal data” and not covered by the regulation are “anonymous data”.

Definitions of anonymous data are provided in some national regulations, for example in Germany or Italy. Opinion 4/2007 adopted by Art. 29 Working Party states that “anonymous data’ can be defined as any information relating to a natural person where the person cannot be identified, whether by the data controller or by any other person, taking account of all the means likely reasonably to be used either by the controller or by any other person to identify that individual” (p.21).

Although the focus of the definition cited above is on man-generated data, with the advent of ubiquitous connectivity, it will be important to include the analysis of relevant machine-generated data when forming a full picture of data anonymity in the modern technology environment. With the technology constraints of real time communications and low power devices operating over sensor networks, the technology community will need to develop additional approaches to mitigate potential for indirect user re-identification in machine generated data. But before this could be done, we need to expand our views on data anonymity to adjust them to the changing technology environment.