Lowering the cost of anonymization

3.2.4 Composition

Composition theorems enable the modular analysis of complex systems and the continued usage of mechanisms over time. In this section, we study two kinds of composition. Sequential composition, introduced in Section 2.1.6.0, and nested composition, where post-processing noise is added to the result of the aggregation.

Sequential composition

We saw in the previous section that noiseless mechanisms could be private under partial knowledge. For such mechanisms, composition does not hold in general. We explain why dependencies between mechanisms are the root cause of composition failing, and we explain how bounding this dependencies allow us to derive usable composition results. First, we show that noiseless composition fails in general.

Example 8. Going back to the voting example, consider the queries “How many people voted Yes?” and “How many people who are not $X$ voted Yes?”, for some individual $X$ . As shown in Section 3.2.1, each query can be private on its own. However, publishing both results reveals $X$ ’s vote: the composition of both queries is not private.

Are there special cases where noiseless counting queries can be composed? In this section, we propose a criterion, $(μ, ν)$ -boundedness, under which sequential composition does hold.

The core problem with Example 8 is that the two queries are heavily dependent on each other. In fact, knowing the result to the first query only leaves two options for the result of the second query: it drastically reduces an attacker’s uncertainty about the second query’s result. We show that this dependency between queries is the main obstacle towards a composition result and prove that mechanisms where the dependency is bounded (Definition 63) can actually be composed (Theorem 6).

How can we formalize the bounded dependency between mechanisms? A natural approach is to quantify how much the additional knowledge of the first mechanism impacts the privacy loss of the second mechanism.

Definition 62. Given two mechanisms $M_{1}$ and $M_{2}$ , two $t, t^{'} \in T$ , two outputs $O_{1}, O_{2}$ , a distribution $θ$ , an index $i$ , a possible value of the background knowledge $ˆ B$ compatible with $D (i) = t$ and $D (i) = t^{'}$ , the dependency of $M_{2}$ on $M_{1}$ to distinguish $D (i) = t$ and $D (i) = t^{'}$ is the function $O^{2} \to R \cup {- \infty, \infty}$ defined by:

{Dep}_{i, t, t^{'}}^{M_{1}, M_{2}, θ} (O_{1}, O_{2} ∣ ∣ ˆ B) = L_{i, t, t^{'}}^{M_{2}, θ} (O_{2}, (ˆ B, M_{1} (D) = O_{1})) - L_{i, t, t^{'}}^{M_{2}, θ} (O_{2}, ˆ B)

using the convention $\pm \infty - x = \pm \infty$ for all $x$ .

Intuitively, this value quantifies the amount of additional information that $M_{1}$ gives the attacker when analyzing the privacy loss of $M_{2}$ . In Example 8, the first term is $\pm \infty$ , as knowing both the results of $M_{1}$ and $M_{2}$ leaks the value of $D (i)$ , while the second term is typically finite. So ${Dep}_{i, t, t^{'}}^{M_{1}, M_{2}, θ} (O_{1}, O_{2} ∣ ∣ ˆ B)$ takes infinite values, which captures the fact that the two mechanisms together leak a lot of information.

Bounding this dependency can be done in the same way as using the PLRV to define differential privacy: we bound $Dep$ by $μ$ almost everywhere, and use a small quantity $ν$ to capture rare events where $Dep > μ$ .

Definition 63 ( $(μ, ν)$ -bounded dependency). Given a family of distributions $Θ$ , two mechanism $M_{1}$ and $M_{2}$ are $(μ, ν)$ -bounded dependent for $Θ$ if for all $θ \in Θ$ , all indices $i$ and records $t, t^{'} \in T$ , and all $ˆ B \in B$ :

\begin{matrix} E \begin{matrix} θ_{| D (i) = t, ˆ B}, O_{1} \sim M_{1} (D), \end{matrix} [max (0, 1 - e^{μ - {Dep}_{i, t, t^{'}}^{M_{1}, M_{2}, θ} (O_{1}, O_{2} ∣ ∣ ˆ B)})] \end{matrix}

is smaller or equal to $ν$ .

This notion formalizes the intuition that the result of the first mechanism should not impact “too much” the result of the second mechanism. As we show in the following theorem, the dependency of $M_{2}$ on $M_{1}$ can be used to express the PLRV of the composed mechanism as a function of the PLRV of the two original mechanisms. As a direct consequence, we show that two $(μ, ν)$ -bounded dependent mechanisms can be sequentially composed.

Theorem 6. Given a distribution $θ$ , two mechanisms $M_{1}, M_{2}$ , an indice $i$ , records $t, t^{'} \in T$ , and $ˆ B \in B$ , the PLRV of the composed mechanism $M (D) : = (M_{1} (D), M_{2} (D))$ satisfies:

L_{i \leftarrow t ∕ i \leftarrow t^{'}}^{M, θ} (O, ˆ B) = 2 \cdot {Dep}_{i, t, t^{'}}^{M_{1}, M_{2}, θ} (O_{1}, O_{2} ∣ ∣ ˆ B) + L_{i, t, t^{'}}^{M_{1}, θ} (O_{1}, ˆ B) + L_{i, t, t^{'}}^{M_{2}, θ} (O_{2}, ˆ B) .

As a corollary, if $M_{1}, M_{2}$ are $(μ, ν)$ -bounded dependent, if $M_{1}$ is $(Θ, ε_{1}, δ_{1})$ -APK, and if $M_{2}$ is $(Θ, ε_{2}, δ_{2})$ -APK for $Θ$ , then $M$ is $(Θ, 2 μ + ε_{1} + ε_{2}, δ_{1} + δ_{2} + ν)$ -APK.

Proof. Fix $i$ , $t$ , $t^{'}$ , $O$ and $ˆ B$ . The main statement is straightforward to prove by decomposing:

\begin{matrix} L_{i \leftarrow t ∕ i \leftarrow t^{'}}^{M, θ} (O, ˆ B) & = ln \frac{P_{θ} [M (D) = (O_{1}, O_{2}) ∣ ∣ D (i) = t, ˆ B]}{P_{θ} [M (D) = (O_{1}, O_{2}) ∣ ∣ D (i) = t^{'}, ˆ B]} = ln \frac{P_{θ} [M_{2} (D) = O_{2} ∣ ∣ M_{1} (D) = O_{1}, D (i) = t, ˆ B]}{P_{θ} [M_{2} (D) = O_{2} ∣ ∣ M_{1} (D) = O_{1}, D (i) = t^{'}, ˆ B]} + ln \frac{P_{θ} [M_{1} (D) = O_{1} ∣ ∣ D (i) = t, ˆ B]}{P_{θ} [M_{1} (D) = O_{1} ∣ ∣ D (i) = t^{'}, ˆ B]} \end{matrix}

and plugging in the definition of ${Dep}_{i, t, t^{'}}^{M_{1}, M_{2}, θ} (O_{1}, O_{2} ∣ ∣ ˆ B)$ .

To prove the composition theorem, we have to show that, if we denote $ε = ε_{1} + ε_{2} + μ$ and $δ = δ_{1} + δ_{2} + ν$ :

\begin{matrix} E θ_{| D (i) = t, ˆ B}, O \sim M (D) [max (0, 1 - e^{ε - L_{i \leftarrow t ∕ i \leftarrow t^{'}}^{M, θ} (O, ˆ B)})] \leq δ . \end{matrix}

Note that the function $f (x) = max (0, 1 - e^{x})$ is subadditive: for all $x$ and $y$ , $f (x + y) \leq f (x) + f (y)$ . Indeed, if $x > 0$ , then $f (x) = 0$ , and it is straightforward to verify that $1 - e^{x + y} \leq 1 - e^{y}$ . The same reasoning holds when $y > 0$ . Finally, if $x \leq 0$ and $y \leq 0$ , then we must show that $1 - e^{x + y} \leq 2 - e^{x} - e^{y}$ , which is equivalent to $(e^{x} - 1) (e^{y} - 1) \geq 0$ , which trivially holds.

We then define:

\begin{matrix} a & = ε_{1} - L_{i, t, t^{'}}^{M_{1}, θ} (O_{1}, ˆ B) b & = ε_{2} - L_{i, t, t^{'}}^{M_{2}, θ} (O_{2}, ˆ B) c & = μ - {Dep}_{i, t, t^{'}}^{M_{1}, M_{2}, θ} (O_{1}, O_{2} ∣ ∣ ˆ B) \end{matrix}

and we use this subadditivity property: $f (a + b + c) \leq f (a) + f (b) + f (c)$ . As we have $a + b + c = ε - L_{i, t, t^{'}}^{M, θ} (O, ˆ B)$ , we can directly plug this into the expression above and use the assumptions of the theorem to show that it is bounded by $δ_{1} + δ_{2} + ν = δ$ . □

Note that the characterization of $L_{i \leftarrow t ∕ i \leftarrow t^{'}}^{M, θ} (O, ˆ B)$ enables the use of more sophisticated composition bounds for differential privacy, such as the advanced composition theorem [131], Rényi Differential Privacy [284], or privacy buckets [278]. For simplicity, here, we only used the standard (non-tight) composition bound for DP.

A common special case directly leads to $(0, 0)$ -bounded dependent mechanisms: two mechanisms that work on distinct parts of a database are $(0, 0)$ -bounded dependent if these two parts are independent.

Proposition 29. Let $Θ$ be a family of distributions, and let $M_{1}$ and $M_{2}$ be mechanisms. Assume that for any $θ \in Θ$ , there are functions $π_{1}$ and $π_{2}$ such that $π_{1} {(D)}_{| D \sim θ}$ and $π_{2} {(D)}_{| D \sim θ}$ are independent, and functions $M_{1}^{'}$ and $M_{2}^{'}$ such that $M_{1} (D) = M_{1}^{'} (π_{1} (D))$ and $M_{2} (D) = M_{2}^{'} (π_{2} (D))$ . Then $M_{1}$ and $M_{2}$ are $(0, 0)$ -bounded dependent.

We now present an natural example of a practical scenario where we can use these composition results.

Example 9. Consider a regularly updated database, like usage information about an online service. Statistics $q$ are computed from this database: for example, among registered users, how many of them used a specific feature on any given day. This count is released daily, and we want to understand how the privacy of a particular user is impacted over time.

This can be represented by a database $D$ where each record $i$ is a series of binary values ${(D (i))}_{j}$ , where $j = 0, 1, 2, \dots$ , and we release a series of mechanisms $M_{j} (D) = \sum_{i} q ({D (i)}_{j})$ . The results of Section 3.2.1 can be used to determine the privacy of each $M_{j}$ depending on the data-generating distribution $θ$ . The goal is to determine the privacy of multiple queries, assuming independence between $D (i_{1})$ and $D (i_{2})$ for all $i_{1} \Leftrightarrow i_{2}$ .

The analysis of the privacy guarantees offered by this setting over time depends on $θ$ , and on the correlations between the different values of a record. If ${D (i)}_{j_{1}}$ is independent from ${D (i)}_{j_{2}}$ for all $j_{1} \Leftrightarrow j_{2}$ , then the result is direct. Otherwise, we must quantify the maximum amount of correlation between ${D (i)}_{j}$ and ${D (i)}_{j + 1}$ . Quantifying this can be done using indistinguishability: we can assume, for example, that there is a $c \geq 0$ such that for all $a \in T$ and all indices $i$ and $j$ :

\begin{matrix} {({D (i)}_{j + 1})}_{| {D (i)}_{j} = t} \approx_{c} {D (i)}_{j} . \end{matrix}

Under this assumption, it is easy to verify that mechanisms $M_{j}$ and $M_{j + 1}$ are $(2 c, 0)$ -bounded dependent, so we can use the composition result of Theorem 6 and derive bounds on the privacy leakage over time.

This approach can be extended to other scenarios, for example if only a subset of users participate to each update, or if a referendum contains multiple questions, whose answers are correlated. Another possible scenario is if only a subset of users participate to each update. We can represent this by having ${D (i)}_{j}$ be either a categorical value (which encodes e.g. the type of interaction) or a special value $⊥$ that encodes “user $i$ did not participate to this update”. The probabilities and correlation relationships of different values associated with the same user can be set to capture different scenarios (e.g. the probability that ${D (i)}_{j} = ⊥$ can be large, to capture a scenario where few users participate every round).

Nested composition

The results of Sections 3.2.1 to 3.2.3 show that noiseless mechanisms can be considered private, assuming some additional assumptions on the attacker’s background knowledge. With enough records, even pessimistic assumptions (considering an attacker who knows a large fraction of records) can lead to very small values of $ε$ and $δ$ . However, one could still consider these assumptions as too brittle, and decide to add a small amount of additional noise to the mechanism to have it satisfy differential privacy in its original form.

Such mechanisms have a double privacy guarantee: under realistic assumptions, their privacy level is very high thanks to the attacker’s uncertainty, and the additional noise provides a “worst-case” privacy level that the mechanism satisfies independently of the attacker capabilities. Without noise, we can use results like Theorem 3 to show that a given aggregation over $n$ records is $(Θ (| B |), ε (| B |), δ (| B |))$ -APKDP (or PPKDP), where $| B |$ is the number of records that the attacker knows. In situations like ones we have seen so far, $ε (| B |)$ and $δ (| B |)$ can be very small when $| B |$ is close to $0$ , but might become unacceptably high when $| B |$ gets close to $n$ . Adding noise can be a way to guarantee that $ε (| B |)$ and $δ (| B |)$ never get above a certain point: when there is not enough randomness coming from the data anymore, the guarantee from post-processing noise take over. Figure 3.5 illustrates this phenomenon.

without noise

02571000001|𝜀5050....B,,,02468|000,0000w0000i0th noise — Figure 3.5: $ε$ from the closed-form formula of Theorem 3 for $δ = 1 0^{- 10}$ , $λ = 0.05$ , and $n = 100, 000$ , as a function of the number of records known by the attacker $| B |$ . We compare two scenarios: either we do not add any post-processing noise, or we add Laplace noise of scale 2 to the output.

Of course, it is also natural to wonder whether the two sources of uncertainty could be combined. The privacy guarantees from Theorem 3 come from the shape of the binomial distribution, just like the shape of Laplace noise is the reason why adding it to the result of an aggregation can provide $ε$ -DP. It seems intuitive that combining two sources of noise would have a larger effect.

In some cases, this effect can be numerically estimated. Given a noise distribution $X$ added to a mechanism of sensitivity $s$ , the PLRV can be obtained by comparing the distributions of $X$ and $X + s$ . To estimate the PLRV coming from two noise sources summed together (for example, binomial and geometric noise), we can simply compute the convolution of the corresponding two distributions, and use the result to compute the PLRV, and thus, the $(ε, δ)$ graph. We demonstrate this approach in Figure 3.6, where we add two-sided geometric noise (see Definition 8) to a noiseless counting query.

000011111111𝜀δ....00000002468−−−−−−−1118642420 nppo==no00i..se56 — Figure 3.6: Numerical computation of the $(ε, δ)$ bounds given by Theorem 3 with $n = 10, 000$ and $λ = 0.05$ (in dashed blue), compared with the bounds obtained by adding two-sided geometric noise of parameter $p = 0.5$ or $p = 0.75$ (see Definition 8) and combining both probability distributions.

It is natural to ask whether we could obtain generic results that quantify the combined effect of noise coming from the input data and noise added after the aggregation mechanism, without numerical evaluation. In [176], the authors propose such a result, based on the fact that Gaussian distributions are closed under convolution. The noise from the input data is approximated by a Gaussian using the central limit theorem, and their Theorem 6 shows that adding Gaussian noise leads to a smaller $ε$ . However, since the $δ$ term comes from the central limit theorem approximation, it cannot be improved beyond $δ = O (1 ∕ \sqrt{n})$ in general.

We could solve this by simply making the assumption that the input data unknown from the attacker actually follows a Gaussian distribution. Sadly, the corresponding result would be very brittle: if an attacker does not conform exactly to this approximation, then the result no longer holds. This is a major criticism of privacy definitions which assume the input data has inherent randomness [350]. The results of this paper are not so brittle, as the privacy guarantees degrade gracefully with the assumptions we make on the attacker’s partial knowledge (e.g. the number of records known, or the value of $p$ in Theorems 3 or 4).

Another approach would be choose the noise added as post-processing based on the natural noise distributions emerging from the partial knowledge assumption. For example, since the proof of Theorem 3 uses the fact that the attacker uncertainty corresponds to binomial noise, we could also add binomial noise as post-processing, since $B (n, p) + B (m, p) = B (n + m, p)$ . However, this property depends on the exact value of $p$ , which again creates a brittleness we were trying to avoid.

The question of computing the privacy loss in situations where multiple sources of randomness are combined appears in other scenarios. Amplification by sampling or amplification by shuffling are examples of such results. These two classes of results are generic: they do not depend on the exact mechanism used to obtain the initial $(ε, δ)$ -DP guarantee. It is unlikely that such generic results exist when combining two arbitrary sources of noise, each of which satisfies $(ε, δ)$ -DP.

Some other results depend on additional assumptions on the noise distribution, like amplification by iteration [149] or amplification by mixing and diffusion mechanisms [25]. These do not seem to bring significant improvements in scenarios like Theorem 3 with post-processing noise: amplification by iteration is tailored for a case where noise is added many times (not only once), while amplification by mixing and diffusion require stronger assumptions on the original noise distribution.

An generic result on the privacy guarantee of chained $ε$ -DP mechanisms appears in [139] (Appendix B). This tight result is only valid for pure $ε$ -DP, but the main building block holds for $(ε, δ)$ -DP mechanisms: proving a fully generic chained composition result is equivalent to solving the special case where the input and output of both mechanisms have values in ${0, 1}$ . This result can likely be extended to the $(ε, δ)$ -DP, although the analysis is surprisingly non-trivial, and fully generic optimality results do not necessarily mean optimality for the special case of additive noise mechanisms.

LINKPREV LINKUP LINKNEXT