Ted is writing things

On privacy, research, and privacy research.

A friendly, non-technical introduction to differential privacy

— updated

Differential privacy is getting a lot of attention lately. Companies and governments are starting to publish data anonymized with this notion. Universities are offering courses about it. Statisticians are getting acquainted with this new approach to protecting data. Open-source organizations are publishing tooling to make differential privacy easier to use.

So, you might be wondering: what's the hype all about? What even is differential privacy? What makes it so special? How does it work in practice? And, perhaps more importantly, can I understand it without having to read a bunch of complicated equations?

The good news is: you've come to the right place. Welcome to my friendly blog post series about differential privacy! It provides simple explanations for the core concepts behind differential privacy. It is meant for a wide, non-technical audience: it doesn't assume any prior knowledge, uses as little math as possible, and illustrates everything with simple examples and diagrams.

Same diagram as before, duplicated, with the bottom line missing one person in the database. A double arrow labeled "basically the same" points to the two outputs. Same diagram as before, duplicated, with the bottom line missing one person in the database. A double arrow labeled "basically the same" points to the two outputs. image/svg+xml Some process

Sounds interesting? Excellent! Start with these two articles.

Then, this blog post series splits in two branches. You can read one or the other in any order, depending on what you're most interested in.

The first branch is about the how: what techniques can you use to achieve differential privacy? It's a little bit technical, though I still keep it as sprivaciimple as I can. If that doesn't sound interesting, feel free to skip over and go directly to the next section!

  • Differential privacy in practice (easy version) explains how to make simple statistics differentially private: counts, sums, histograms… In many cases, you don't need much more!
  • Almost differential privacy describes how to publish private histograms without knowing the categories in advance. It also introduces an important variant: \((\varepsilon,\delta)\)-differential privacy.
  • The privacy loss random variable introduces this central concept in differential privacy. Then, it uses it to explain the real meaning of \(\delta\) in \((\varepsilon,\delta)\)-differential privacy.
  • The magic of Gaussian noise introduces Gaussian noise and its shiny properties. In particular, it explains why Gaussian noise is so useful when releasing many statistics.
  • Getting more useful results with differential privacy presents five simple techniques to improve the utility of your anonymized data. If you're rolling out your first differential privacy pipeline and it's not performing well, this is the article for you!

The second branch of these series is about the why. In which contexts can differential privacy be used? Why do organizations decide to adopt it? What policy questions does it raise? The articles in this branch are accessible to non-technical folks, and are all self-contained. You can read them in any order you like!

  • Local vs. global differential privacy presents the two main models of differential privacy, depending on who the attacker is. Publishing provably anonymized data is very different from collecting it, and this post explains the difference.
  • Why not differential privacy? explores what it means for an algorithm to not be differentially private. Don't understand why people insist on using differential privacy, even when it seems overkill? This is the resource for you.
  • Demystifying the US Census Bureau's reconstruction attack explains why the government agency decided to use differential privacy. They ran an attack on the data they published in 2010, and the results are quite scary.
  • Don't worry, your data's noisy makes the case that all data is noisy, even without differential privacy. In fact, the noise used in differential privacy is the best kind of noise: the one you can quantify.

Finally, one article lists the known real-world deployments of DP, along with their privacy parameters.

This series isn't finished. I have a list of future articles I'd like to write… and I'm adding new ideas to this list faster than I'm writing blog posts! If you're looking for further things to read on differential privacy, you can do two things.

  • You can check out this reading list I curated. I particularly recommend it you're looking for more formal content: textbooks with mathematical proofs, scientific papers, etc.
  • You can follow me on Twitter or subscribe to this blog's RSS feed to keep updated about future posts.
All opinions here are my own, not my employer's.   |   Feedback on these posts are very welcome! Please reach out via e-mail (se.niatnofsed@neimad) or Twitter (@TedOnPrivacy) for comments and suggestions.   |   Interested in deploying formal anonymization methods? My colleagues and I at Tumult Labs can help. Contact me at oi.tlmt@neimad, and let's chat!