The risks of centralising data

As federated, centralised identity systems have proliferated, identity and personal data have become one and the same placing consumers and organisations at risk.

May 2, 2024

The modern digital economy has driven an unprecedented centralisation of data, creating vast, internet-connected databases that act as both irresistible targets for malicious actors and systemic points of failure.⁴ When these repositories contain Personally Identifiable Information (PII), their compromise can lead to immediate and devastating consequences, including identity theft, financial fraud, and personal endangerment.⁵ The very architecture of these centralised systems—storing millions of records in a single logical location—means that a single successful intrusion can yield a disproportionately massive reward for attackers, a reality proven by the relentless cadence of large-scale data breaches affecting corporations and government agencies alike.¹ The security measures protecting these "honeypots" must be flawless, yet they are pitted against a determined and ever-evolving threat landscape, making a breach not a matter of if, but when.⁶

Beyond the immediate risk of a direct breach, centralisation creates a more subtle but equally pernicious threat known as correlation risk. While organisations may diligently protect overtly sensitive PII such as passport numbers or financial details, they often collect a wide array of seemingly innocuous data points: location check-ins, purchase histories, website Browse habits, or even smartwatch heart rate data. Individually, these data points may appear anonymous. However, when aggregated within a single, vast database, they can be cross-referenced and correlated—by malicious insiders, external attackers who have gained access, or even by the data controller itself—to de-anonymise individuals with alarming accuracy.² A landmark study showed how supposedly anonymous Netflix movie rating data could be correlated with public IMDb ratings to re-identify specific users.⁷

This risk is magnified because data is rarely static. Datasets from different sources, often acquired through mergers, third-party agreements, or data brokers, can be combined.⁸ A user's seemingly anonymous activity on one platform can be linked to their real-world identity from another, creating a composite "super-profile" without their explicit knowledge or consent. This process of re-identification can reveal intimate details about a person's life, beliefs, and vulnerabilities, transforming disparate, non-sensitive data points into a highly invasive and detailed personal dossier.³ The fundamental problem is that as datasets grow and are linked, the possibility of identifying an individual from a small number of unique data points approaches certainty, making the very concept of "anonymous data" in a centralised system a dangerously flawed assumption.

This core problem is at the heard of our research into personal data, especially as it relates to identity. How to make data useful to a relying party, without it placing the relying party or the subject at risk? Not solving this problem has made delivering on the 7 laws of identity almost impossible as organisations couldn't benefit from implementing those laws.

¹ Identity Theft Resource Center. (2025). 2024 Annual Data Breach Report.

² Ohm, P. (2010). Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization. UCLA Law Review, Vol. 57.

³ Information Commissioner's Office (ICO). Anonymisation, pseudonymisation and privacy enhancing technologies guidance.

World Economic Forum. (2024). The Global Risks Report 2024.

UK National Cyber Security Centre (NCSC). The cyber threat to UK business.

Spitzner, L. (2003). Honeypots: Tracking Hackers. Addison-Wesley Professional.

Narayanan, A. & Shmatikov, V. (2008). Robust De-anonymization of Large Sparse Datasets. Proceedings of the 2008 IEEE Symposium on Security and Privacy.

Federal Trade Commission (FTC). (2014). Data Brokers: A Call for Transparency and Accountability.

Secure in Name Only
Exploring how metaphors like “vaults,” “keys,” and “wallets” shape our sense of safety—and the risks that confidence creates.
Are we getting closer to a true identity layer?
Initiatives like W3C Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) are moving us toward an identity layer
The internet was built for machines not people
The internet has no built-in identity layer — it only knows devices and keys, not people. Everything we call “online identity” is an application-level patch, and that’s why the web’s trust and security problems are so persistent.‍