Lowering the cost of anonymization

a PhD thesis

5  Conclusion

From the ground, we stand. From our ships, we live. By the stars, we hope.
(Becky Chambers, Records of a Spaceborn Few)

Trying to lower the cost of anonymization, in 2020, can sometimes feel like a pointless academic endeavor. Is data privacy really worth focusing on, when climate change is quickly making our planet unlivable [362]? Only a global pandemic has seemed able to temporarily slow down the global warming process [239], while existing plans to tackle this enormous challenge seem both inefficient and doomed to failure [240]. Not only do we appear to be unable to mitigate this looming threat, but we also seem woefully unprepared to handle its consequences. Migration crises will likely intensify over the next few decades, and nations all over the world are already showing their systematic inability to handle them with any shred of humanity or kindness [175, 229, 343]. We urgently need to build more just societies, that acknowledge their white supremacist past, and uphold fundamental human rights for everyone, including marginalized communities. Instead, world superpowers are quickly descending into fascism [386], building technological infrastructure that empowers belligerent authoritarianism [319, 377], and getting away with committing cultural genocide [403].

Fighting off these potentially catastrophic shifts is critical to upholding every value we hold dear. But preventing our societies from succumbing to apocalyptic threats is only half the battle: we also need to actively build a fairer, kinder, more sustainable world. Systemic challenges require systemic solutions: we need to change incentives and structures so that people and organizations naturally tend towards doing the right thing for all the stakeholders they impact.

The development of privacy-preserving technologies can play a positive role in pushing some of those incentives in the right direction. As an example, encrypting Web traffic at scale is globally beneficial, and efforts that make secure protocols easier to use and deploy are making it that comparatively harder for organizations to resist calls to migrate to safer practices. Similarly we envision a world where there are fewer and fewer reasons for sharing or publishing data in an unsafe, identifiable manner, which puts individuals at risk. Such a shift would primarily benefit those who, today, pay the highest price when their personal information is leaked: marginalized groups targeted by online harassment, victims of domestic abuse, or political dissidents under authoritarian regimes.

Nonetheless, it is worth examining the implicit power dynamics at play in differential privacy, especially in its central form discussed throughout this thesis, as almost all the techniques we introduced, developed, or improved upon, assume that a central organization collects and stores raw data from individuals. Once all this personal and sensitive information has been collected, this central organization the one responsible for anonymizing it and then sharing or publishing it in a safe manner. This is a vertical model of differential privacy: it presupposes a high level of trust that the individual must put in the central organization, to take great care of their data and not abuse it. This trust might be completely misguided, as many recent high-profile data leaks or questionable uses of data tend to suggest. Further, this relationship is often not as consensual as it first seems: for instance, can you really choose not to participate to a service that all your friends use to organize social events? Finally, even organizations with good intent are subject to applicable laws and regulations, some of which can compel them to share the data from their users with authoritarian governments.

This model will likely never disappear entirely: retailers need to know who their customers are and where to ship their packages, hospitals need to have information about the patients they treat, government organizations like the US Census have a mandate to collect demographic data from their residents and use it to make decisions that impact their democratic system. Further, as we mentioned in the introduction of this thesis, and as exemplified by use cases like the Community Mobility Reports or the Search Trends symptoms dataset, such data publications can have a significantly positive social impact. Thus, we argue that improving incentives for such use cases is still worth doing.

However, these implicit power dynamics, which indirectly cause massive systemic problems, suggest that lowering the cost of central differential privacy is not enough. To minimize the negative societal impact from data hoarding, we need to incentivize organizations not to collect sensitive data in the first place. This suggests investing in alternative models, like local or distributed differential privacy. In these models, a central aggregator can learn useful information about global trends, but does not have access to the raw data: they cannot abuse it, leak it inadvertently, or provide it to governments.

Randomized response was a first example of such a system, but a number of very promising advances have been made in that direction in the past few years. Trusted shufflers or secure aggregation techniques can provide strong guarantees and high utility with a minimal level of trust. Just like any other privacy-enhancing technology, making them both useful and usable will be paramount to their success, and will unlock their positive impact globally.

We predict that rolling out these protocols at scale will raise very similar issues to the ones discussed in this thesis about central differential privacy: improving usability will be difficult but paramount, unexpected vulnerabilities will have to be identified and fixed, tricky operational challenges will appear and will need to be tackled. Collaborations across the industry, governments, nonprofit organizations and academics will be necessary for such a project to be successful. It will be difficult, frustrating, tiring, fascinating, and exhilarating—anything worth doing usually is.

All opinions here are my own, not my employers.
I'm always glad to get feedback! If you'd like to contact me, please do so via e-mail (se.niatnofsed@neimad) or Twitter (@TedOnPrivacy).