Without a Data Privacy Law, India Must Consider Hazards of ‘Deanonymisation’ of Non-Personal Data

Deanonymisation is performed by combining anonymised datasets to identify information about a particular user in different contexts, which can reveal layered and comprehensive personal information about an individual.

Around August 30, draft guidelines on how government projects can anonymise and harness e-governance-related data were opened for public consultation by the Ministry of Electronics and Information Technology (MeitY), with little to no fanfare.

Why would the Ministry propose anonymising data?

Large datasets are useful for research, governance, or commerce – they often contain a mix of personally identifiable data alongside descriptive data related to that individual. This data, which supposedly does not identify a person, can be useful to access and analyse. However, as long as it sits alongside personal data (which is usually protected by data protection laws), processing it poses privacy risks to individuals.

So, to make use of this information, organisations and governments ‘scrub’ datasets of personal data. This supposedly leaves them full of ‘non-personal data’ that is ‘anonymised’ because it doesn’t actually link back to an individual anymore or harm their privacy. The datasets are then released for public use.

The guidelines were another step for non-personal data governance in India

Like many countries, the Indian government is pushing for the utilisation of anonymised non-personal data to improve governance, research and competition between businesses. State governments are keen on the idea of utilising anonymised non-personal data too – in April, the Tamil Nadu government released ‘masked’ data on the Tamil Nadu Public Service Commission selection process under its open data policy.

However, around September 6, the guidelines were withdrawn from the ‘e-Governance Standards and Guidelines’ website, almost as unceremoniously as when they were first uploaded. Reports suggest that the Ministry withdrew the guidelines because “they were released without adequate expert consultation”. A new document will be released soon.

Perhaps the Ministry’s decision was wise. As experts told MediaNama, the ‘anonymisation’ of personal data does not guarantee individual privacy – techniques to protect information are easily susceptible to being reversed. This can lead to the ‘re-identification’ or ‘deanonymisation’ of a dataset – revealing the identity of an individual or a group of people, while violating their privacy and exposing them to a wide range of harms.

What’s worse, deanonymisation is a relatively easy exercise for well-trained malicious actors. And, in the event that it does happen, Indian citizens have no recourse to protect themselves given the absence of a data protection law.

Also read: What Could the Future of Indian Data Protection Law Look Like?

“With data being the new gold, cybercriminals or other individuals will target areas with large stores of personally identifiable data [or potentially identifiable data],” argues Utsav Mittal, a Certified Information Systems Security Professional and founder-CEO of Indian cybersecurity firm Xiarch.

“Ultimately, the likelihood of a sector or organisation being hacked [or its available information being used for malicious purposes] is directly proportional to the amount of personal data they have,” Mittal said.

The privacy risks posed by large anonymised datasets – which capture the characteristics of India’s population, businesses and landscapes in granular detail – are self-evident. However, none of these risks can be mitigated – or penalised – without a data protection law in force. Even the withdrawn e-governance guidelines were “voluntary” and lacked statutory backing.

“Ultimately, data-driven policymaking will be successful only if we address the triad of intertwined deficits India is facing in the fields of democracy, data and development,” surmises Vikas Kumar, faculty at the School of Development at Azim Premji University, whose recent work includes Numbers in India’s Periphery: The Political Economy of Government Statistics.

“Think of these like a tripod. If you raise the height of one leg [data processing] but do not raise the height of the other two legs [democracy and development], then data-driven policymaking will be ineffective. You need to raise all three legs at the same time – but for that, you need to be concerned about the other two legs in the first place.”

What is deanonymisation and how does it work?

“[To explain anonymisation] In the simplest of terms, once collected by an entity, the data is stripped of Personal Identifiers (PI) and released in small segments (only 1% of a larger dataset or only the anonymised medical information of 1000 patients in a hospital housing 100,000),” explains Ayushman Kaul, senior analyst at Logically.

“Once treated in this manner, the data is widely distributed, and even modern research institutions and academicians are encouraged to release anonymised datasets of their work to have their workings independently verified by the broader community. In fact, after a dataset has been through the anonymisation process, it is no longer considered to be ‘personal information’ and thus is often exempt from many of the judicial safeguards designed to safeguard an individual’s privacy. These datasets can subsequently be freely used, shared, and sold.”

Deanonymisation, on the other hand, is performed by combining scrubbed datasets to identify information about the same user in different contexts. This linking of datasets can reveal layered and comprehensive personal information about an individual, which is why experts suggest that anonymisation is not a foolproof technique to protect privacy.

“The term ‘anonymised data’ can convey a false sense of security because it’s almost impossible to be sure that personal data has been made truly anonymous and will always be anonymous,” notes Christine Runnegar, senior director of Internet Trust at the Internet Society. “A better term to use when trying to anonymise personal data by removing identifying information is ‘de-identified data’. It conveys the idea that known identifying information has been removed. Be aware though that there may still be some unrecognised identifying information, or the data could be re-identified when combined with other data.”

Also read: Examining India’s Quest to Regulate, Govern and Exploit Non-Personal Data

As far back as 2006, researchers studying US census data found that 63% of the population sampled could be identified by combining just three demographic indicators: gender, zip code and birth date. The researchers were building on studies from the early 2000s – which found that 87% of the US population could be identified by using the same indirect identifiers.

In 2019, researchers found that “99.98% of Americans would be correctly re-identified in any [anonymised] dataset using 15 demographic attributes”. The Belgium and London-based researchers concluded that “even heavily sampled anonymised datasets are unlikely to satisfy the modern standards for anonymisation set forth by GDPR (the European Union’s privacy-protecting law)”.

What’s more, accessing anonymised data and performing these analyses on it isn’t necessarily difficult – at least, for those in the know.

“Any dataset is very easy to access for a reasonably resourceful cybercriminal or person,” explains Mittal. “This data can be easily bought off the dark web using cryptocurrencies like Bitcoin, which provide these actors with a degree of anonymity too. The data is ‘cheap’ and its price often decreases – after all, this is a marketplace of information.”

For example, BBC recently reported that 80GB of NATO’s confidential security data was being sold online for 15 Bitcoin (around £273,000). India has seen many private and public datasets breached online too.

“Once procured, technical actors deanonymise the data,” says Mittal. As Kaul explains, these can include “governments, law enforcement agencies, data brokerage firms, social media platforms, digital marketers, scammers, journalists, and security researchers (..) The complexity of this process is directly correlated with the ‘granularity’ of the anonymous dataset and the number of ‘auxiliary’ datasets available for cross-referencing.”

“Then, the technical actors simply sell the data to the next groups looking to buy this information on the ground,” adds Mittal.

How does deanonymisation harm people?

“De-identified data, even if it cannot ever be linked back to a particular known individual, can still have serious privacy implications, if it can be used to single out an individual or section of the community,” warns Runnegar. “One of the key risks is discrimination.”

For example, in 2006, AOL released “anonymised” search logs of half a million of its users. While names weren’t included, reporters from the New York Times were still able to quickly identify 62-year-old Thelma Arnold from the dataset.

In 2014, New York City released “anonymised” data on over 173 million taxi rides for public use. However, some experts were able to identify which taxis carried out specific trips (and thus who drove them). This was because the dataset wasn’t anonymised strongly enough, making it easier to triangulate identities.

More recently, in 2021, the sexual activities of a high-ranking US Catholic priest, Monsignor Jeffrey Burrill, were exposed through triangulating his location using “aggregated” Grindr-related usage procured from data brokers. All of this information was procured legally by a newsletter covering the Catholic church– and deanonymised successfully by it too. Burrill resigned after the news broke.

In India specifically, the identity-based risks of combined public datasets are already apparent.

“I’ve seen census data being misused in the run-up to violence while on the field,” recalls Kumar. “Vulnerable communities anticipate this, so they sometimes falsify the information they provide to government data collectors to prevent this from happening. For example, in the 1984 riots in the aftermath of Indira Gandhi’s assassination, electoral rolls were used to identify and target Sikhs living in Delhi.” Rioters reportedly used school registration and ration lists too.

During the 2020 communal riots in Northeast Delhi, reports further suggested that data from the Ministry of Road Transport and Highways’s vehicle registration database ‘Vahan’ may have been used to identify vehicles owned by Muslims and set them ablaze.

Also read: Delhi Police Affidavit Shows Muslims Bore Brunt of Riots, Silent on Who Targeted Them and Why

The probability of deanonymisation taking place is also constantly shifting.

“Given the sheer amount of data collected on individuals, the growing sophistication of machine learning algorithms that can be trained on incomplete or heavily segmented datasets, and the ease of access to auxiliary datasets, the methodology of ‘anonymisation’ is becoming largely redundant,” argues Kaul.

“In fact, a research paper published in 2013 analysing mobility data [one of the most sensitive forms of anonymous datasets as it contains the approximate location of an individual and can be used to reconstruct individuals’ movements across space and time] found that due to the uniqueness of human mobility traces, little outside information was needed to re-identify and trace targeted individuals in sparse, large-scale and coarse mobility datasets.”

Further, as The Register notes while quoting the United Kingdom’s guidelines on data anonymisation:

“The problem (..) is that you can never be sure what other data is out there and how someone might map it against your anonymous data set. Neither can you tell what data will surface tomorrow, or how re-identification techniques might evolve. Data brokers readily selling location access data without the owners’ knowledge amplifies the dangers.”

Citizens in India currently have no recourse against such harms. “We’re discussing this next phase of data governance in India [of non-personal data] before we even have a data protection law in place,” argues Shashank Mohan, project manager at the Centre for Communication Governance at the National Law University, Delhi.

“Other countries may promote the sharing and processing of non-personal data, but they have mature, evolving, and robust data protection laws in place. This conversation on non-personal data in India becomes merely academic as a result – leaving aside non-personal data, even if my personal data is breached today, or if an entity is not adhering to basic data protection principles, as a citizen I have next to no redressal mechanisms.”

Why is there a push for utilising non-personal data, especially in the notable absence of a data protection law?

The use of anonymised non-personal data for India’s governance and economic growth was notably fleshed out in 2020’s Kris Gopalakrishnan Committee report, which defined non-personal data as “any data that is not related to an identified or identifiable natural person, or is personal data that has been anonymised”. The Committee’s vision partially translated to subsequent draft data protection laws, indicating a clear governmental enthusiasm for collecting and processing citizens’ data for economic benefits.

“There is significant value to anonymising datasets for better governance,” explains Astha Kapoor, co-founder and director of the Aapti Institute. “Ultimately, this value is defined in two ways: public value and economic value. These are not mutually exclusive, at least in the eyes of the Kris Gopalakrishnan Committee’s report on non-personal data. There’s an economic value to improved efficiency.”

“The UNDP-incubated innovation lab ‘Pintig’ really ramped up efforts to collect non-personal data about COVID-19 infection patterns in the Philippines,” illustrates Soujanya Sridharan, research analyst at the Aapti Institute. “Data on the infections was used to create a dashboard primarily for policymakers and municipal administrators to determine how to deliver aid and care. In Finland, legislation for the ‘Secondary Use of Health and Social Data‘ unlocks non-personal data to use for specific purposes. Among them is obviously planning and governance, but also research, innovation, and education.”

Kapoor adds that in the Indian context, the Niti Ayog has also developed the National Data and Analytics Platform, which seeks to “democratise data delivery by making government datasets readily accessible, implementing rigorous data sharing standards, enabling interoperability across the Indian data landscape, and providing a seamless user interface and user-friendly tools”.

The Balakrishnan Committee also pushed for the anonymisation and sharing of company data to spur innovation within Indian industries and reduce the often hegemonic advantages large companies can have within a sector.

Besides companies and governments, non-personal data is also useful to localised groups and communities too. “We’ve seen patients with Multiple Sclerosis pool their data, anonymise it and share it with researchers investigating the disease. Indigenous communities in Australia and Canada also use non-personal data on their water bodies or land to negotiate with the government on specific issues,” shares Kapoor.

However, regulators the world over may also be interested in utilising anonymised datasets because they lie outside the usually stringent provisions of personal data protection law. Deanonymisation can act as a ‘workaround’ in the face of laws protecting personal data.

“Traditionally, data protection law has covered personal data, as that is what’s intrinsically linked to your privacy and can lead to your identification,” says Mohan. “But, as technology evolves, multiple players in the data ecosystem have realised there is immense value in processing data. That’s where the whole conversation around NPD and data governance is hinged on – the processing of data that’s not largely protected under the ‘burdens’ of data protection laws.”

“Right now, if you obtain personal data in India and use it without consent, there still might be certain drawbacks,” adds Anushka Jain, associate policy counsel (surveillance and transparency) at the Internet Freedom Foundation.

“For example, the Information Technology (Reasonable security practices and procedures and sensitive personal data or information) Rules, 2011 [2011 Rules] prevent companies from doing anything with sensitive personal data that the person has not consented to. The enforcement of those rules is another question, but they at least exist. So, the moment you’re processing personal data without consent, you’re doing it illegally. Why do that, when you can deanonymise non-personal data, and then do whatever you want with it?”

What are the legal and practical gaps in how the government views anonymised data?

The government suggested a wide array of anonymisation techniques government departments could use in its now-withdrawn guidelines, perhaps in order to mitigate the privacy risks of deanonymisation. However, while a strong anonymisation technique may make a malicious actor’s job that much harder, it may not always protect datasets from deanonymisation either.

“Poor de-identification methods such as simply removing an individual’s name from the data pose a high risk of re-identification, but even better methods of de-identification may not prevent re-identification some day in the future,” notes Runnegar. “However, organisations can reduce privacy risks of re-identification by treating the de-identified data as personal data, applying good data protection practices such as data minimisation, limitations on use, access and sharing, and applying security such as encryption. Also, for data collected from groups or populations, secure multiparty computation (MPC) can be used to analyse aggregate data while protecting the privacy of the data.”

The fact that India doesn’t have a data protection law in place – whether for personal or non-personal data – perhaps renders such good practices redundant. As the government drags its feet on introducing the law, companies themselves remain unsure as to what information should be anonymised and how, leading to a patchwork approach to privacy protection. This points to a larger classification problem in India, where definitions of personal and non-personal data, and what constitutes privacy, remain in a state of flux.

Also read: Privacy Delayed Is Privacy Denied

For example, as seen above, deanonymisation may allow for the identification of not only individuals, but of larger groups of people centred around specific characteristics. In this light, as Varunavi Bangia has previously argued for MediaNama, the state must “‘fundamentally alter (..) [its] conception of the subjects of the right to privacy from protecting individuals to protecting groups (..) The focus must (..) be on conceptualising a right available to a group not merely because each individual in that group has an independent right to privacy, but a right that belongs to the group as a group. The most important regulatory intervention is to ensure that collective rights are neither subordinated to nor seen in conflict with individual rights.”

While Indian courts have recognised that the right to privacy extends beyond individual privacy to collective rights too, iterations of India’s proposed data protection law continued to club personal and non-personal data together for regulatory purposes. “This is because of a lack of understanding of the value of non-personal data to corporations in creating group profiles,” argues Bangia.

With different re-identification scenarios outside the regulator’s ambit, accountability for breaches and harms is hard to come by.

Some hope appeared in the now-withdrawn draft Data Protection Bill, 2021. “While the Bill combined the protection of personal and non-personal data, it acknowledged that anonymisation can fail, which is why it brought all data under its regulatory ambit,” recalls Kapoor. “It imposed penalties for deanonymisation, making it a punishable offence.”

With the Bill now withdrawn, MeitY is murmuring that non-personal data will not be governed by the personal data protection law. A potential successor is months (if not years) away from being passed. In the meantime, how users are supposed to file grievances or hold data processors accountable for re-identification also remains murky.

How do citizens seek redressal for re-identification now?

“Any re-identification of non-personal data will be counted as a data breach in India,” explains Sridharan. “While we don’t have a personal data protection law, we do have the 2011 Rules. But, there are no penalties or costs actually attached to data breaches or re-identification. The only avenue people have is filing writ petitions in courts to protect their right to privacy. But, institutional bottlenecks arise in that context too.”

Currently, then, data processors may be liable for re-identification. “How these actors will be held liable is using a cocktail of existing laws,” says Jain. “This may involve stretching the provisions of the Indian Penal Code and the Information Technology Act, 2000, and applying them to individual instances of re-identification.”

Tejasi Panjiar, associate policy counsel at the Internet Freedom Foundation, adds that given this regulatory vacuum, “neither private actors nor governments are held accountable or mandated to follow international best practices like data minimisation and storage and purpose limitation. Even the way the Balakrishnan Committee report approached deanonymisation is very post-facto. If it occurred, the data would then fall under the personal data protection legislation and penal provisions would be imposed. We don’t have robust, enforceable policies on anonymisation that ensure that identifiable aspects of personal data are removed to a high degree.”

This approach may be at odds with global standards for anonymisation imbued in privacy laws, and against the privacy-protecting model MeitY purportedly aims to embed in its new batch of Internet laws.

For example, the European Union’s General Data Protection Regulation (GDPR) considers a dataset “anonymous” when each person is individually protected. Even when datasets are scrubbed of identifiers, if they do contain data that could lead to re-identification, then this ‘anonymised’ dataset would fall within the provisions of the GDPR.

The stringency of the GDPR’s provisions on anonymisation came to light in March 2019 when the Danish Data Protection Regulator fined taxi company Taxa approximately $180,000 for retaining data on nine million taxi rides across five years. The company argued that it was exempted from the GDPR’s provisions on data minimisation and storage limitations because the dataset was anonymised by deleting individual names. This meant it could use and store the anonymised data for much longer.

While Taxa’s actions were in line with Recital 26 of the GDPR, the Danish regulator argued that the company failed to meet the high standards of anonymisation set out in the same recital. The fact that individuals could still be easily re-identified meant that the dataset was not anonymised, subjecting it to the personal data protections of the GDPR.

How can the state approach regulation?

“There’s always a risk of fire consuming a building, But, does that mean you don’t put laws in place to mitigate the chance of a fire?” asks Mittal. “Just because a risk of deanonymisation exists doesn’t mean we throw [the potential benefits] of anonymisation out of the window. It means we need to push for stronger standards and laws to be brought in.”

“In some ways, personal data protection laws risk becoming obsolete unless they keep up with data processing technologies,” argues Mohan provocatively. “Scholars are now suggesting that we need to have a right to reasonable inference, instead. This concept discards the distinction between personal data and non-personal data, arguing that if a data processor is making an inference about a user using their data and if there is potential harm that could arise, then the user needs to be protected against that.”

Given the existing rights framework, however, the government faces a crossroads in terms of how to regulate the sector, which largely hinges on how personal and non-personal data are defined and processed.

“To govern personal and non-personal data under the same law, India’s regulatory structure for data governance may need to develop and mature quickly by learning from other countries and experiences,” says Mohan, alluding to the different ways in which group privacy can be harmed by data processing. “I appreciate that there’s an attempt by the government to shift the power dynamics of business models and data gathering practices in India. But for such non-personal data policies to work for everyone involved, we need a multi-pronged approach: robust laws for personal data protection, non-personal data, and antitrust need to work together.”

The ways in which non-personal data can be protected under a new law are nuanced and purpose-driven, argue Sridharan and Kapoor.

“The purpose of data use should be the starting point for protection,” explains Kapoor. “A farmer, for example, may use a certain kind of fertiliser for their soil. That may be because that fertiliser – and the underlying farming style – is their intellectual property. So, even non-human NPD pertaining to that fertiliser may need to be anonymised and protected to honour their intellectual property rights.”

Sridharan adds that there is a need for separate legislation that “clearly defines the rules and responsibilities for not just the state, but also the rights of the community who helped produce this non-personal data in the first place.”

Panjiar diverges, arguing that as long as non-personal data is viewed through the prism of economic value, as is the case in India, there is a need to bring it under the stringencies of personal data protection.

“As opposed to how personal data is approached, the Balakrishnan Committee report and the draft National Data Governance Framework very explicitly premised the regulation of non-personal data on commercial and financial motives instead of data privacy and user safety,” says Panjiar.

“When you separate the two to the extent that you don’t have strict provisions in place to regulate and protect non-personal data, then the risks that arise out of the deanonymisation of data become even more real. So, in my opinion, we need to consider the regulation of non-personal data through an expert independent body, which would most likely be the proposed Data Protection Authority (DPA) [introduced in the Data Protection Bill, 2021], and is focused on promoting user safety and privacy over commercial motives.”

“There was a reason why India’s founding fathers provided a high degree of privacy to the sharing of census data [as seen in Section 15 of the Census Act],” concludes Kumar. “There is a need to embed that similar notion of privacy into how we approach data now too. Currently, we are looking at things in an abstract fashion. That needs to change. The level of protection provided should be informed by a good understanding of where data processing policies are headed.”

 

This article was originally published in Medianama and has been reproduced here under a CC-BY-SA 4.0 license.

Has India’s Privacy Bill Considered the Dangers of Unrestricted Processing of ‘Anonymised’ Data?

If global lessons are anything to go by, it is hard to imagine an ‘irreversibility standard’ which will ensure complete anonymity.

Anonymisation of data is neither a corollary of privacy protection nor is it an oxymoron to the idea of privacy. Instead, it is more likely a gateway to a possible privacy breach which has not been addressed in the government’s Personal Data Protection Bill, 2019 . 

The Bill which is heralded as a much-needed safeguard to rein in the digital Wild West is an embodiment of the constitutional spirit of privacy, evoked by the Puttaswamy-I case. It seeks to protect the personal data of individuals collected by companies and the state by laying down a comprehensive framework for processing such data. Accordingly, it also outlines what forms of data processing are exempt from this framework. 

One such exemption is the processing of anonymised data. The Bill excludes the coverage of anonymised data, except under Section 91 which covers the use of anonymised and non-personal data by the central government for targeted delivery of services and formulation of evidence-based policies. 

In view of the sweeping exemption of anonymised data and its effects on privacy, it is important to understand its meaning, historical treatment and the requisite safeguards necessary for its protection. 

What is anonymised data?

Anonymisation is a technique applied to personal data to completely strip it of its characteristics, traits, nature and identifiers for possible identification of individuals. On account of this, it is usually not covered by the legislation protecting personal data as it is considered not to affect the privacy of an individual. 

The Bill understands ‘anonymisation’ distinctly from ‘de-identification’. Broadly, anonymisation is subject to a regulatory standard of irreversibility while de-identification is carried out to mask identifying data in accordance with the code of practice prescribed by the Data Protection Authority (Authority). Since de-identification can be reversed, its reversal without adequate consent and transparency to the data principal is now a punishable offence. However, anonymisation is still premised on irreversibility and the impossibility of identification. 

Generally, identifiability of an individual is considered as a spectrum with identification, on one end and perfect anonymity, on the other. The legal protections also correspond with this understanding; personal data and its privacy is protected by law while no equivalent protection is granted to anonymised data. However, the efficacy and possibility of anonymisation itself are considered suspect by many. Some researchers argue that data can be truly anonymised only by deletion while others argue that various technological tools can be used to achieve a practical degree of anonymity. In the same vein, there is also a growing recognition of the trade-off between the utility and privacy of a dataset. 

Also read: Does the Data Protection Bill Solve the Dilemma Posed by Dominance of ‘Foreign’ Apps?

Is it possible to truly anonymise data?

This article seeks to question the assumption of irreversibility and perfect anonymity attached to such data. Over the last decade, substantial research has pointed towards the shaky standing of anonymised data. For example, even during the 1990s, an MIT graduate student named Latana Sweeney, identified the governor of Massachusetts from three data points in an anonymised database. 

The European General Data Protection Regulation (“GDPR”) assesses anonymisation on a standards based approach. It assesses the singularity, linkability and inference that can be drawn from an anonymised set. This refers, respectively, to the ability of the dataset to remotely identify an individual, link with other datasets to identify an individual or draw inferences from the dataset. The Article 29 Working Party, established under the ertswhile data protection regime in Europe, had highlighted the potential of all possible techniques of anonymisation falling short of the standard in one situation or the other.

It assessed various techniques used to anonymise data such as randomisation, generalisation, aggregation etc. and concluded that depending on the technique used, the data may be subject to re-identification, when processed and combined with other datasets. The risk of re-identification arises if certain data points are such as to indirectly identify an individual, in isolation or in combination with more data. Thus, effective anonymity may be hard to ensure in practice. 

In this light, it is subsequently explored if anonymised data retains some value for privacy and if individuals should continue to have a right in it. 

Reasonable expectations of privacy in anonymised data

In view of the idea of the residual risk and traits of personhood that anonymised or non-personal data always retains, its coverage in the Bill through Section 91 has the potential to lay down interesting jurisprudence regarding the contours of reasonable expectations of privacy. While the right to privacy is attached to personal data, the aim of this article is to suggest that a residual privacy right also exists in anonymised data.

The right to privacy has been declared a fundamental right but to prevent financial losses or any other kind of misuse of data, further steps need to be taken. Credit: Reuters

Photo: Reuters

This is because personal data, under the Bill, includes the derivatives of personal data or ‘data about or relating to a natural person who is directly or indirectly identifiable’ which may also arise in ‘combination with other information’. Anonymised data used to target delivery of services, coupled with the risk of de-anonymisation, invariably renders such data as an extension of personal data. It is important to examine if there arise reasonable expectations of privacy in such data, especially its use.

This enquiry is important as the right to privacy extends only upto where it can be reasonably expected to extend. For example, there is no right to privacy in the investigation of a personal diary of a criminal in which he/she has made a personal confession of a crime, subject to a warrant. On the other hand, a right to privacy and bodily autonomy extends to my face and movements as are currently recorded by CCTV cameras in public spaces.

Also read: Privacy Bill Will Allow Government Access to ‘Non-Personal’ Data

This is a standard and test derived from American jurisprudence which suggests that the constitutional privacy protection for an individual is derived by balancing an objective component of privacy against the subjective expectations of that person. While Justice Nariman rejected this test in the case of Puttaswamy-I, it was endorsed in Puttaswamy-II and currently lays down the dominant strand of interpretation for privacy law in India.

The use or processing of anonymised data carries within it the risk of being de-anonymised and turning into personal data. It can be argued that the risk of it being misused is a mere possibility and is not a sufficient reason for recognition of privacy in such data, especially when it is normally understood to be irreversible and thus, protected. There are two responses to this presumption of relative sanctity of anonymised data; firstly such data may not need to be subject to the same level of privacy protection as personal data. The protection needs to be graded to ensure protection of the principal, by laying down strict standards of anonymisation and punishing de-anonymisation.  Secondly, since the privacy right subsists in the culmination of the risk of de-anonymisation, namely, creation of personal data, it is necessary that a more nuanced regulatory framework is applied. Meanwhile, it must be kept in mind that both the state and private parties are involved in usage of non-personal data and in making data-based decisions that affect us, individually or collectively. 

It may also be argued that an individual does not have a subjective expectation of privacy in anonymised data, by virtue of its nature, and thus the question of carving out reasonable expectations of privacy does not arise. This does not hold much validity because the balance leads to a consideration of the objective expectation in the absence of a subjective expectation of privacy. For example, a state university announcing and disclosing the details of a top scorer to newspapers does not imply that the person did not have the right to privacy in such information. While he/she may not want to conceal or hide (or protect) such information, it is legally protected. 

The entire construct can be further looked at from another perspective. Personal data also includes data which indirectly identifies an individual. This may be done using certain specified traits or in combination with other information. The degree of indirect identifiability is not explained or laid down in the Indian context yet. To that extent, any semblance of recognition of a person in an anonymised dataset may overlap with indirectly identifying personal data where reasonable expectations of privacy naturally subsist. Thus, the authority would also do well to lay down the extent of indirect identifiability in contrast to anonymisation.  

The impact of de-anonymisation on an individual under the Bill

This enquiry arises as the envisaged use of non-personal data, under the Bill, opens up a wide range of possibilities of public use of anonymised data. Even otherwise, anonymisation was generally considered a legal way out for companies to circumvent the application of law. In view of these practices, the primary concern is what happens if the data is de-anonymised by any kind of processor/fiduciary, after further processing – intentionally or otherwise?

Also read: Interview | Dilution of Privacy Bill Makes Govt Surveillance a Cakewalk: Justice Srikrishna

The moment anonymisation is removed from data, it becomes personal data and falls within the purview of the Bill. The ex-ante compliance with the irreversibility standards simply allows the conversion of personal data to anonymised data. There are two possibilities in the event of de-anonymisation; the fiduciary complies with the Bill or it does not. These options are more pronounced in the case of de-anonymisation because there is no way for individuals or the authority to know that the data has been compromised.

It is within the exclusive domain and knowledge of a fiduciary. Envisaging this possibility, the private member Data (Privacy and Protection) Bill, 2017 provided the right to individuals to be informed of a personal data breach arising due to de-anonymisation. Similarly, an ex-post sanction on re-identification de-identified data which includes anonymized data, has been put in place in the UK Data Protection Act, 2018. This is necessary to allocate responsibility where it is due. The processor or fiduciary which collects the data should comply with the irreversibility standard while the ultimate processor which handles the data and re-identifies it should be sanctioned for the negligence and offence. 

As things currently stand, there is no way for individuals to be informed that data which was once part of an anonymised dataset has been de-anonymised and is being used for identification or profiling. To an extent, a data principal or an individual may obtain information from a data fiduciary under Section 17. However, this assumes active attention expended by an individual to track informational privacy, something which the society is grappling to understand in real terms.

The exercise of Section 17 by the data principal, to identify a fiduciary which may be using an erstwhile anonymised dataset, is a stretch of both the imagination and the provision. Due to the problem of lack of incentives and oversight for processors and fiduciaries to maintain the integrity of anonymised data, it is important to ensure efficient checks by auditory oversight of the authority and an ex-post sanction to curb de-anonymisation.

Section 82 of the Bill, in its current form, only punishes reversal of de-identification, with no sanction for reversal of anonymisation. It also punishes such re-identification with no exemptions for the research community. This has a perceptible chilling effect and is an inadequate safeguard to protect the anonymised data of individuals. Thus, all forms of sanction on re-identification must expressly guard against this possibility.  

The way forward

The extent of possibility of de-anonymisation can be effectively curtailed by the irreversibility standards laid down by the Authority. But if the global lessons are anything to go by, it is hard to imagine a standard which will ensure complete anonymity.

Also read: Final Privacy Bill Could Turn India into ‘Orwellian State’: Justice Srikrishna

If that may be so, it is important to lay down safeguards ranging from sanctioning de-anonymisation, including the right to transparency granted to principals for the use of such data, obliging the fiduciaries to inform the principals the moment they possesses de-anonymised personal data (currently, such notice is required to be given ‘as soon as reasonably practicable’ under Section 7) and periodic audit requirements to check the integrity of anonymised data.

 It is also hoped that the irreversibility standard development will be informed by a technical consideration of the reasonable expectations of privacy arising in such data. 

Anushka M. is a research associate at ‘IT for Change,’ a Bangalore-based NGO.