Ted is writing things

On privacy, research, and privacy research.

Choosing things privately with the exponential mechanism

— updated

This post is part of a series on differential privacy. Check out the table of contents to see the other articles!


The goal of most differential privacy mechanisms is to publish statistics: numbers giving some information about groups of people. But to build more complex DP mechanisms, we sometimes need a different kind of building block. In this article, we won't be adding noise to numbers. Instead, we'll make a choice among multiple options, in a differentially private way.

A simple example

Let's say that we're designing a poll to pick the best science-fiction books published in 2020. First, we'll collect a big list of books published that year. Then, we ask people to select the books they liked. Each person can pick as many books as they want, and we want to select the book with most votes. If there is a tie, we select the winner randomly.

How do we publish this in a privacy-preserving way? Let's take a look at the books with the most votes in our voting results.

A bar chart showing the number of votes on 6 books; the x-axis ends in a
dotted line, suggesting some books are not being shown. "Network Effect" has 50
votes, "Hench" and "The Hidden Girl" have 49 votes, "The Relentless Moon" has 47
votes, "Axiom's End" and "Riot Baby" have 46 votes.

We can't simply publish the true answer without randomness. Otherwise, we would be publishing the most popular book (here, Network Effect) 100% of the time. But imagine that we add a single person, who only votes for one of the second-most popular books.

The same bar chart as before, except "The Hidden Girl" now has 50 votes, and
its bar is a slightly darker shade of blue.

In that case, we would want to release either one or the other with equal probability. This is a violation of differential privacy: 50% of the time, we publish The Hidden Girl, and this couldn't happen without this added user!

A bar chart as before with the same books in the x-axis; now the y-axis is
labeled "Probability of being selected as winner". In light blue, legended
"Original data", there is only one bar, at probability 1, for "Network Effect".
In dark blue, legended "With one added vote", there are two bars at probability
0.5: one for "Network Effect" and one for "The Hidden Girl". The bars for "The
Hidden Girls" are labeled 0 and 0.5, in red, and circled; red text above the
circle says "DP violation!".

To avoid this, we need to add more randomness in our process. How do we do that exactly?

Well, we already know how to publish histograms with differential privacy: we add well-calibrated Laplace noise to each of the statistics. So why don't we just do that? If we make the entire histogram private, we can release all of it. In particular, we can see which book has the highest noisy vote count, and declare it the winner.

But we have a problem here. The noise has to be scaled by the number of statistics that each user can contribute to. In our setting, a particularly enthusiastic user could vote for all the books. If our list has 10,000 books to choose from, then we have to multiply the noise scale by 10,000. This seems… not great. And it also feels unnecessary: we don't want to release the entire histogram, we only want to pick a winner. Could we use that fact to inject less noise into the process?

Let's try to think about what a good strategy would look like. Say we are using \(\varepsilon=\ln(2)\). Let's look at the votes for the most popular books again.

The same bar chart as the first one, showing the number of votes on 6 books.
The x-axis ends in a dotted line, suggesting some books are not being shown.
"Network Effect" has 50 votes, "Hench" and "The Hidden Girl" have 49 votes, "The
Relentless Moon" has 47 votes, "Axiom's End" and "Riot Baby" have 46
votes.

The true winner is Network Effect. So we want to select this true answer with some probability, hopefully as high as possible.

A bar chart showing the probability distribution of the book selected as a
winner. It has a single bar for "Network Effect", all other bars are empty and
replaced by question marks.

Now, what is the probability of selecting one of the second-best choices? They both have one fewer vote than the winner. They're not the correct answer, so we want to select them with as small a probability as possible. But we're also constrained by our differential privacy guarantee.

Imagine that we add a single new person to the data, and they vote for Hench. Then, we should be selecting Network Effect and Hench with equal probability.

The same bar chart as earlier, with an additional grey & dotted-line bar for
"Hench", of equal height to the one for "Network
Effect".

With DP, we must select Hench with similar probability as in this hypothetical scenario. How similar? We chose \(e^\varepsilon=2\), so there can be a factor of at most 2 between these probabilities. We want it to be as small as possible, so let's make it exactly half.

The same bar chart as earlier, with an additional blue bar for "Hench", whose
height is half that of the grey bar. A red arrow labeled "Selection probability
divided by e^ε = 2" goes from the grey bar to the blue
bar.

The exact same reasoning holds for The Hidden Girl, which has as many votes as Hench.

A bar chart showing the probabilities of the three books with most votes.
"Network effect" has the biggest bar, "Hench" and "The Hidden Girl" each have a
bar whose height is half, all other books have a question
mark.

What about one of the books that is a little further away from the winner? The Relentless Moon, for example, is 3 votes short: we would need three more votes to get to the winning probability. If we can add one person at a time, we need three steps to get there.

The number of votes for each book; with three arrows on top of the bar for
"The Relentless Moon", labeled "3 more votes needed to reach the
winner"

And each time we add one person, we have to respect the differential privacy constraint: we can at most double the probability of selecting The Relentless Moon. To arrive there after these three steps, we need to start from at least \(\left(\frac{1}{2}\right)^3=1/8\) of the maximum probability.

A bar chart showing the probabilities of the four books with most votes.
"Network effect" has the biggest bar, "Hench" and "The Hidden Girl" each have a
bar whose height is half, "The Relentless Moon" has a bar whose height is 8
times smaller, the other two books have a question
mark.

We can repeat this idea and draw our full probability distribution. For each book, if it's \(k\) votes short of the true winner, its selection probability should be \(1/2^k\) of the true winner.

A bar chart showing the probabilities of all the books displayed on the chart.
"Network effect" has the biggest bar, "Hench" and "The Hidden Girl" each have a
bar whose height is half, "The Relentless Moon" has a bar whose height is 8
times smaller, "Axiom's End" and "Riot Baby" each have a bar whose height is 16
times smaller.

Achieving this for an arbitrary \(\varepsilon\) is straightforward: the probability of selecting a book \(i\) with \(k_i\) votes should be proportional to \(\exp\left(\varepsilon \cdot k_i\right)\). After normalizing to make the probabilities sum to \(1\), we get the following formula:

$$ \mathbb{P}\left[\text{We choose book }i\right] = \frac{\exp\left(\varepsilon \cdot k_i\right)}{\sum_i \exp\left(\varepsilon \cdot k_i\right)}. $$

We call this DP procedure the exponential mechanism.

A generic statement and a simple optimization

The example above is very simple: each book is simply associated to the nubmer of votes it received. But the exponential mechanism is much more generic, and we can use it in more complex settings. Let's say we have a database \(D\), and we have to choose between many items \(O_1\), \(O_2\), and so on. We assume that each item \(O_i\) has a score \(s_i(D)\), which depends on the dataset. Let \(\Delta\) be the sensitivity of the scoring function: the maximum change to \(s_i(D)\) when one person is added to (or removed from) \(D\), for all \(i\). Then the exponential mechanism \(\mathcal{M}\) is defined as:

$$ \mathbb{P}\left[\mathcal{M}(D) = O_i\right] = \frac{\exp\left(\varepsilon\cdot\frac{s_i(D)}{2\Delta}\right)}{\sum_i \exp\left(\varepsilon\cdot\frac{s_i(D)}{2\Delta}\right)}. $$

Proving that it satisfies ε-DP is very easy — the proof in the original paper is just 3 lines long! Try to come up with it by yourself. Or you can also click here:

In our example above, the \(O_i\) are the books, and the score of each book is its number of votes. Adding or removing one person modifies the scores by at most one, so \(\Delta=1\), and we get the same…

Wait a second. We're not getting the same thing! There's a factor of \(2\) in the generic formula that we didn't have in our voting example. So if we were using the generic formula, we would get worse utility: the probability of selecting the winner would be smaller. Can we get rid of this multiplicative factor?

The answer is yes, because our scores are monotonic: if we add a user, they will all get larger. If we remove one, they will all get smaller. That's a common special case, and in that case, you can remove the \(2\) factor:

$$ \mathbb{P}\left[\mathcal{M}(D) = O_i\right] = \frac{\exp\left(\varepsilon\cdot\frac{s_i(D)}{\Delta}\right)}{\sum_i \exp\left(\varepsilon\cdot\frac{s_i(D)}{\Delta}\right)}. $$

Again, this is quite easy to prove, especially if you've understood the previous proof.

Note that here, we assumed that we're protecting the addition or removal of a single person in the dataset. If we want to protect any change in a single person's votes instead, the privacy analysis changes: someone could add one vote to a book and remove a vote to another. The scores would no longer be monotonic, and we would need to pay the \(2\) factor in our formula.

More results?!

The exponential mechanism is a central building block in differential privacy. It's been studied from many different angles, so there is a lot to say about it. This blog post is long enough already, but here are a few ✨ selected facts ✨. Follow the links if you'd like to learn more!

  • The exponential mechanism can be implemented in a simple way: add noise from a Gumbel distribution to each score, and choose the item with the highest noisy score.
  • Its privacy guarantees can be finely analyzed using a notion called bounded range. This allows you to prove that an exponential mechanism calibrated for \(\varepsilon\)-DP also satisfies \(\rho\)-zCDP with \(\rho=\frac{1}{8}\varepsilon^2\): a lot better than the typical conversion of \(\rho=\frac{1}{2}\varepsilon^2\).
  • Using the exponential mechanism several times? Don't use regular composition theorems! Instead, using the special structure of this mechanism can get you tighter results.
  • You can do better than the exponential mechanism, with a mechanism called Permute-and-Flip. Its original definition is somewhat complicated, but people found a nice characterization afterwards: add noise from a geometric distribution to each score, and pick the highest noisy score.
  • However, the exponential mechanism still retains one advantage: it can also be used when the space of possible choices is continuous, like "every real number between 0 and 1"1. This is very useful, for example to compute the median of values in a dataset.

I'm thankful to Daniel Simmons-Marengo and PeoriaBummer for helpful feedback on this post.


  1. In that case, the density of the probability distribution on \(x\) must be proportional to \(\exp\left(\varepsilon\cdot\frac{s_x(D)}{2\Delta}\right)\). The normalization factor (in the denominator) also becomes an integral instead of a discrete sum. 

All opinions here are my own, not my employer's.   |   Feedback on these posts is very welcome! Please reach out via e-mail (se.niatnofsed@neimad) or Twitter (@TedOnPrivacy) for comments and suggestions.   |   Interested in deploying formal anonymization methods? My colleagues and I at Tumult Labs can help. Contact me at oi.tlmt@neimad, and let's chat!