Lowering the cost of anonymization

2.2.2 Quantification of privacy loss (Q)

The risk model associated to differential privacy is a worst-case property: it quantifies not only over all possible neighboring datasets but also over all possible outputs. However, in many real-life risk assessments, events with vanishingly small probability are ignored, or their risk weighted according to their probability. It is natural to consider analogous relaxations, especially since these relaxations often have better composition properties, and enable natural mechanisms like the Gaussian mechanism to be considered private [131].

Most of the definitions within this section can be expressed using the privacy loss random variable, first defined in [110] as the adversary’s confidence gain, so we first introduce this concept. Roughly speaking, it measures how much information is revealed by the output of a mechanism.

Definition 13 (Privacy loss random variable [110]). Let $M$ be a mechanism, and $D_{1}$ and $D_{2}$ two datasets. The privacy loss random variable between $M (D_{1})$ and $M (D_{2})$ is defined as:

L_{M (D_{1}) ∕ M (D_{2})} (O) = ln (\frac{P [M (D_{1}) = O]}{P [M (D_{2}) = O]}) .

if neither $P [M (D_{1}) = O]$ nor $P [M (D_{2}) = O]$ is 0; in case only $P [M (D_{2}) = O]$ is zero then $L_{M (D_{1}) ∕ M (D_{2})} (O) = \infty$ , otherwise $L_{M (D_{1}) ∕ M (D_{2})} (O) = - \infty$ . When the mechanism is clear from context, we simply write $L_{D_{1} ∕ D_{2}}$ .

For simplicity, we only consider the case where the set of possible outputs of the mechanism, $O$ , is countable; otherwise the PLRV can be reformulated using the density functions of $M (D_{1})$ and $M (D_{2})$ instead of specific outputs.

Differential privacy bounds the maximum value of $L_{D_{1} ∕ D_{2}}$ . Instead of considering the maximum value, which corresponds to the worst possible output, relaxations of this section will allow a small probability of error, consider the average of the privacy loss random variable, or describe its behavior in finer ways.

Allowing a small probability of error

The first option, whose introduction is commonly attributed to [126], relaxes the definition of $ε$ -indistinguishability by allowing an additional small density of probability on which the upper $ε$ bound does not hold. This small density, denoted $δ$ , can be used to compensate for outputs for which the privacy loss is larger than $e^{ε}$ . This led to the definition of approximate differential privacy, often simply called $(ε, δ)$ -DP. This is, by far, the most commonly used relaxation in the scientific literature.

Definition 14 ( $(ε, δ)$ -differential privacy [126]). Two random variables $A$ and $B$ are $(ε, δ)$ -indistinguishable, denoted $A \approx_{ε, δ} B$ , if for all measurable sets $X$ of possible events:

P [A \in X] \leq e^{ε} \cdot P [B \in X] + δ and P [B \in X] \leq e^{ε} \cdot P [A \in X] + δ .

A privacy mechanism $M$ is $(ε, δ)$ -DP (or $(ε, δ)$ -approximate DP) if for any datasets $D_{1}$ and $D_{2}$ that differ only on one record, and for all $S \subseteq O$ , $M (D_{1}) \approx_{ε, δ} M (D_{2})$ .

This definition is equivalent with Max-KL stability [37], a special case of algorithmic stability, which requires that one change in an algorithm’s inputs does not change its output “too much”.

The $δ$ in $(ε, δ)$ -DP is sometimes explained as the probability that the privacy loss of the output is larger than $e^{ε}$ (or, equivalently, that the $ε$ -indistinguishability formula is satisfied). In fact, this intuition corresponds to a different definition, first introduced in [267] as probabilistic DP, also called $(ε, δ)$ -DP in distribution in [60]. A detailed explanation of the distinction between the two definitions can be found in [277].

Definition 15 ( $(ε, δ)$ -probabilistic differential privacy [277]). A privacy mechanism $M$ is $(ε, δ)$ -probabilistically DP (ProDP) if for any datasets $D_{1}$ and $D_{2}$ that differ only on one record there is a set $S_{1} \subseteq O$ where $P [M (D_{1}) \in S_{1}] \leq δ$ , such that for all measurable sets $S \subseteq O$ :

P [M (D_{1}) \in S ∖ S_{1}] \leq e^{ε} \cdot P [M (D_{2}) \in S ∖ S_{1}] .

It is straightforward to show that $(ε, δ)$ -DP is stronger than $(ε, δ)$ -ProDP (with no change in parameters); a proof of the reverse result (with parameter change) is given in [408]. Both definitions can be reformulated using the privacy loss random variable.

Proposition 8. A mechanism $M$ is:

$ε$ -DP $\Leftrightarrow$ $P_{O \sim M (D_{1})} [L_{D_{1} ∕ D_{2}} (O) > ε] = 0$ for all neighboring $D_{1}$ and $D_{2}$ .
$(ε, δ)$ -DP $\Leftrightarrow$ $E_{O \sim M (D_{1})} [max (0, 1 - e_{D_{1} ∕ D_{2}}^{ε - L_{D_{1} ∕ D_{2}} (O)})] \leq δ$ for all neighboring $D_{1}$ and $D_{2}$ .
$(ε, δ)$ -ProDP $\Leftrightarrow$ $P_{O \sim M (D_{1})} [L_{D_{1} ∕ D_{2}} (O) > ε] \leq δ$ for all neighboring $D_{1}$ and $D_{2}$ .

Approximate and probabilistic differential privacy can be combined to form $(ε, δ_{a}, δ_{p})$ -relaxed DP (RelDP) [407], which requires $(ε, δ_{a})$ -DP with probability at least $1 - δ_{p}$ .

Averaging the privacy loss

As $ε$ -DP corresponds to a worst-case risk model, it is natural to consider relaxations to allow for larger privacy loss for some outputs. It is also natural to consider average-case risk models: allowing larger privacy loss values only if lower values compensate it in other cases. One such relaxation is called Kullback-Leibler privacy [31 , 88]: it considers the arithmetic mean of the privacy loss random variable, which measures how much information is revealed when the output of a private algorithm is observed.

Definition 16 ( $ε$ -Kullback-Leibler privacy [31 , 88]). A privacy mechanism $M$ is $ε$ -Kullback-Leibler private (KLPr) if for all $D_{1}$ , $D_{2}$ differing in one record:

E_{O \sim M (D_{1})} [L_{D_{1} ∕ D_{2}} (O)] \leq ε .

(2.2)

Note that this formula can be expressed as $D_{KL} (M (D_{1}) | M (D_{2})) \leq ε$ where $D_{KL}$ is the Kullback-Leibler-divergence.

$ε$ -KL privacy considers the arithmetic mean of the privacy loss random variable or, equivalently, the geometric mean of $e_{D_{1} ∕ D_{2}}^{L_{D_{1} ∕ D_{2}}}$ . This choice of averaging function does not attribute a lot of weight to worst-case events, where $L_{D_{1} ∕ D_{2}}$ takes high values. Rényi DP extends this idea by adding a parameter $α \geq 1$ , which allows controlling the choice of averaging function by bounding the $α$ th momentum of the privacy loss random variable.

Definition 17 ( $(α, ε)$ -Rényi differential privacy [284]). Given $α > 1$ , a privacy mechanism $M$ is $(α, ε)$ -Rényi DP (RenyiDP) if for all pairs of neighboring datasets $D_{1}$ and $D_{2}$ :

E_{O \sim M (D_{1})} [e_{D_{1} ∕ D_{2}}^{(α - 1) L_{D_{1} ∕ D_{2}} (O)}] \leq e^{(α - 1) ε} .

Note that this formula can be expressed as $D_{α} (M (D_{1}) | M (D_{2})) \leq ε$ where $D_{α}$ is the Rényi-divergence of order $α$ .

This definition can be naturally extended by continuity to $α = 1$ (where it is equivalent to $ε$ -KL privacy) and $α = \infty$ (where it is equivalent to $ε$ -DP). Larger values of $α$ lead to more weight being assigned to worst-case events: $(α, ε)$ -Rényi DP $≻$ $(α^{'}, ε)$ -Rényi DP iff $α > α^{'}$ . Besides $α = 1$ and $α = \infty$ , Rényi DP has a simple interpretation for some values of $α$ : $α = 2$ imposes a bound on the arithmetic mean of $e_{D_{1} ∕ D_{2}}^{L_{D_{1} ∕ D_{2}}}$ , $α = 3$ imposes it on the quadratic mean, $α = 4$ on the cubic mean, etc. A related technique is the moments accountant [1] which keeps track of a bound on the moments of the privacy loss random variable during composition.

It is possible to use other divergence functions to obtain other relaxations. For example, in [380], the authors introduce two technical definitions, binary- $| χ |^{α}$ DP (b- $| χ |^{α}$ DP) and ternary- $| χ |^{α}$ DP (t- $| χ |^{α}$ DP), as part of a proof on amplification by sampling. Other examples of divergences can lead to other variants, like $ε$ -total variation privacy [31] ( $ε$ -TVPr, using the total variance) and quantum DP [81] (QDP, using the quantum divergence).

Another possibility to average the privacy loss is to use mutual information to formalize the intuition that any individual record should not “give out too much information” on the output of the mechanism (or vice-versa). This is captured by $ε$ -mutual-information DP (MIDP) [88], which guarantees that the mutual information between $M (D)$ and $D (i)$ conditioned on $D_{- i}$ is under a certain threshold. The bound is taken over all possible priors on $D$ , which avoids having to reason about the attacker’s background knowledge. This definition, along with KL-privacy, are technically stronger than approximate DP, but the change in parameters was criticized for not providing a strong enough guarantee [272].

Proposition 9. For all $ε > 0$ , $δ \leq 1$ , and $α \leq 1$ :

$ε$ -DP $⟹$ $min {ε, ε^{2}}$ -KLPr (Lemma 1 in [88])
$ε$ -KLPr $⟹$ $ε$ -MIDP $\Rightarrow$ $(0, \sqrt{2 ε})$ -DP (Lemma 1 and 2 in [88])
$ε$ -DP $⟹$ $(α, ε)$ -RényiDP $\Rightarrow$ $(ε + ln (\frac{α - 1}{α}) - \frac{ln (δ) + ln (α)}{α - 1}, δ)$ -DP (Theorem 21 in [26])

Controlling the tail distribution of the privacy loss

Some definitions go further than simply considering a worst-case bound on the privacy loss, or averaging it across the distribution. They try to obtain the benefits of $(ε, δ)$ -DP with a smaller $ε$ which holds in most cases, but control the behavior of the bad cases better than $(ε, δ)$ -DP, which allows for catastrophic privacy loss in rare cases.

The first attempt to formalize this idea was proposed in [132], where the authors introduce concentrated DP (later renamed to mean-concentrated DP (mCoDP) in [57]). In this definition, a parameter controls the privacy loss variable globally, and another parameter allows for some outputs to have a greater privacy loss; while still requiring that the difference is smaller than a Gaussian distribution. In [57], the authors show that this definition does not satisfy the post-processing axiom, and propose another formalization of the same idea called zero-concentrated DP (zCoDP) [57], which requires that the privacy loss random variable is concentrated around zero.

Definition 18 ( $(ξ, ρ)$ -zero-concentrated differential privacy [57]). A mechanism $M$ is $(ξ, ρ)$ -zero-concentrated DP if for all pairs of neighboring datasets $D_{1}$ and $D_{2}$ and all $α > 1$ :

E_{O \sim M (D_{1})} [e_{D_{1} ∕ D_{2}}^{(α - 1) L_{D_{1} ∕ D_{2}} (O)}] \leq e^{(α - 1) (ξ + ρα)} .

Four more variants of concentrated DP exist.

$(ξ, ρ, δ)$ -approximate zero-concentrated DP [57] (AzCoDP), which relaxes $(ξ, ρ)$ -zCoDP by only taking the Rényi divergence on events with probability higher than $1 - δ$ instead of on the full distribution.
$(ξ, ρ, ω)$ -bounded CoDP [57] (bCoDP) relaxes $(ξ, ρ)$ -zCoDP by requiring the inequality to hold only for $α \leq ω$ .
$(ρ, ω)$ -truncated CoDP [56] (tCoDP^[56]) relaxes $(0, ρ)$ -zCoDP in the same way.
$(ξ, τ)$ -truncated CoDP [81] (tCoDP^[81]) requires the Rényi divergence to be smaller than $min (ξ, ατ)$ for all $α \geq 1$ .

The relations between these definitions and other notions in this section is well-understood. Besides the special cases (e.g., $(ρ, \infty)$ -tCoDP^[56] is the same as $(0, ρ)$ -zCoDP) and the relations that are a direct consequence of the definitions (e.g., $(ξ, ρ)$ -zCoDP is the same as the condition “ $(ξ + ρα)$ -RénDP for all $α > 0$ ”), we list known relations below.

Proposition 10. For all $ε > 0$ , $δ > 0$ , $μ > 0$ , $τ > 0$ , $ξ \geq 0$ and $ω > 1$ :

$ε$ -DP $⟹$ $(\frac{ε (e^{ε} - 1)}{2}, ε)$ -mCoDP (Theorem 3.5 in [132])
$ε$ -DP $⟹$ $(0, \frac{ε^{2}}{2})$ -zCoDP (Lemma 8.3 in [57])
$ε$ -DP $⟺$ $(ε, 0)$ -zCoDP (Lemma 3.2 in [57])
$(μ, τ)$ -mCoDP $⟹$ $(μ - \frac{τ^{2}}{2}, \frac{τ^{2}}{2})$ -zCoDP (Lemma 4.2 in [57])
$(ξ, ρ)$ -zCoDP $⟹$ $(ξ + ρ, O (\sqrt{ξ + 2 ρ}))$ -mCoDP (Lemma 4.3 in [57])
$(ξ, ρ)$ -zCoDP $⟹$ $(ξ + ρ + \sqrt{4 ρ ln (\frac{min (1, \sqrt{πρ})}{δ})}, δ)$ -DP (Lemma 3.5 and 3.6 in [57])
$(ξ + \sqrt{ρ ln \frac{1}{δ}})$ -DP $⟹$ $(ξ - \frac{ρ}{4} + 5 \sqrt[4]{ρ}, \frac{ρ}{4})$ -zCoDP (Lemma 3.7 in [57])
$(ρ, ω)$ -tCoDP^[56] $\Rightarrow$ $(ˆ ε, δ)$ -DP, where $ˆ ε = ρ + 2 \sqrt{ρ ln \frac{1}{ρ}}$ if $ln \frac{1}{δ} \leq {(ω - 1)}^{2} ρ$ , and $ˆ ε = ρω + \frac{ln \frac{1}{δ}}{ω - 1}$ otherwise (Lemma 6 in [56])

Extension

Most definitions of this section can be seen as bounding the divergence between $M (D_{1})$ and $M (D_{2})$ , for different possible divergence functions. In [31], the authors use this fact to generalize them and define $(f, ε)$ -divergence DP (DivDP), which takes the particular divergence used as a parameter $f$ .

Definition 19 ( $(f, ε)$ -divergence differential privacy [31]). Let $f$ be a convex function such as $f (1) = 0$ . A privacy mechanism $M$ is $(f, ε)$ -divergence DP if for all pairs of neighboring datasets $D_{1}$ , $D_{2}$ :

E_{O \sim M (D_{1})} [f (e_{D_{1} ∕ D_{2}}^{L_{D_{1} ∕ D_{2}}})] \leq ε .

An instance of this definition was presented in [118] as $(f_{k}, ε)$ -divergence DP; which requires that $E_{O \sim M (D_{1})} [{∣ ∣ e_{D_{1} ∕ D_{2}}^{L_{D_{1} ∕ D_{2}}} - 1 ∣ ∣}^{k}] \leq ε^{k}$ . This definition is mainly used to prove technical results on privacy/utility trade-offs in the local model. For any $k \leq 1$ , $ε$ -DP implies $(f_{k}, e^{ε} - 1)$ -DivDP, and when $k = 2$ , it is equivalent to $(2, ln (1 + ε^{2}))$ -RényiDP (Section 2 in [118]).

Moreover, capacity bounded differential privacy (CBDP) was introduced in [69], which uses $H$ -restricted $f$ -divergence: $D_{f}^{H} (P | Q) = sup h \in H [E_{x \sim P} [h (x)] - E_{x \sim Q} [f^{*} (h (x))]]$ where $f$ is a divergence, $H$ is a family of functions, and $f^{*}$ is the Fenchel conjugate⁵. In other words, it requires the supremum condition to hold only for a selected set of functions (queries) instead of all possible ones. The interpretation for this definition is slightly different than other definitions in this section: $H$ represents the possible attacks that the attacker is allowed to perform.

Finally, most definitions in this section taking two real-valued parameters can be extended to use a family of parameters rather than a single pair of parameters. As shown in [348] (Theorem 2) for approximate DP, probabilistic DP, and Rényi DP, finding the tightest possible family of parameters (for either definition) for a given mechanism is equivalent to specifying the behavior of its privacy loss random variable entirely.

Multidimensional definitions

Allowing a small probability of error $δ$ by using the same concept as in $(ε, δ)$ -DP is very common; many new DP definitions were proposed in the literature with such a parameter. Unless it creates a particularly notable effect, we do not mention it explicitly and present the definitions without this parameter.

Definitions in this section can be used as standalone concepts: $(ε, δ)$ -DP is omnipresent in the literature, and the principle of averaging risk is natural enough for Rényi privacy to be used in practical settings, like posterior sampling [163] or resistance to adversarial inputs in machine learning [317]. Most variants in this section, however, are only used as technical tools to get better results on composition or privacy amplification [131 , 149 , 245 , 380].

⁵The Fenchel conjugate for a function $f$ with a domain $R$ is $f^{*} (x) = sup y \in R [xy - f (y)]$ .

LINKPREV LINKUP LINKNEXT