Lowering the cost of anonymization

2.2.7 Relativization of the knowledge gain (R)

A differentially private mechanism does not reveal more than a bounded amount of probabilistic information about a user. This view does not explicitly take into account other ways information can leak, like side-channel functions or knowledge about the structure of a social network. We found two approaches that attempt to include such auxiliary functions in DP variants. One possibility is to weaken DP by allowing a certain amount of leakage; another option is to explicitly forbid the mechanism to reveal more than another function, considered to be safe for release.

Taking into account an auxiliary leakage function

In [257], authors define bounded leakage DP, which quantifies the privacy that is maintained by a mechanism despite bounded, additional leakage of information by some leakage function. Interestingly, this leakage function $P$ shares the randomness of the privacy mechanism: it can, for example, capture side-channel leakage from the mechanism’s execution. In the formal definition of this DP variant, the randomness is explicit: the privacy mechanism and the leakage takes the random bits $r \in {0, 1}^{*}$ as an additional parameter.

Definition 42 ( $(P, ε, δ)$ -bounded leakage differential privacy [257]). Let $P : D \times {0, 1}^{*}$ be a leakage function. A privacy mechanism $M$ is $(P, ε, δ)$ -bounded leakage differentially private (BLDP) if for all pairs of neighboring datasets $D_{1}$ and $D_{2}$ , all outputs $O_{P}$ of $P$ such that $P [P (D_{1}, r) = O_{P}] \Leftrightarrow 0$ and $P [P (D_{2}, r) = O_{P}] \Leftrightarrow 0$ , and all sets of outputs $S \subseteq O$ :

P [M (D_{1}, r) \in S | P (D_{1}, r) = O_{P}] \leq e^{ε} \cdot P [M (D_{2}, r) \in S | P (D_{2}, r) = O_{P}] + δ

where the randomness is taken over the random bits $r$ .

As expected, if there is no leakage ( $P$ is a constant function), this is simply $(ε, δ)$ -DP. The authors also show that it is closed for post-processing and composable. Furthermore, if the privacy mechanism is independent from the leakage function, it is strictly weaker than differential privacy.

Borrowing concepts from zero-knowledge proofs

When using the associative interpretation with the independence assumption, it is unclear how to adapt DP to correlated datasets like social networks: data about someone’s friends might reveal sensitive information about this person. The causal interpretation of DP does not suffer from this problem, but how to adapt the associative view to such correlated contexts? Changing the definition of the neighborhood is one possibility (see Section 2.2.3.0), but it requires knowing in advance the exact impact of someone on other records. A more robust option is to impose that the information released does not contain more information than the result of some predefined algorithms on the data, without the individual in question. The method for formalizing this intuition borrows ideas from zero-knowledge proofs [170].

Instead of imposing that the result of the mechanism is roughly the same on neighboring datasets $D_{1}$ and $D_{2}$ , the intuition is to impose that the result of the mechanism on $D_{1}$ can be simulated using only some information about $D_{2}$ . The corresponding definition, called zero-knowledge privacy and introduced in [161], captures the idea that the mechanism does not leak more information on a given target than a certain class of aggregate metrics. This class, called model of aggregate information in [161], is formalized by a family of (possibly randomized) family of algorithms $Agg$ .

Definition 43 ( $(Agg, ε)$ -zero-knowledge privacy [161]). Let $Agg$ be a family of (possibly randomized) algorithms $agg$ . A privacy mechanism $M$ is $(Agg, ε)$ -zero-knowledge private (ZKPr) if there exists an algorithm $agg \in Agg$ and a simulator $Sim$ such as for all datasets $D$ and indices $i$ , $M (D) \approx_{ε} Sim (agg (D_{- i}))$ .

In [161], authors show that $(Agg, ε)$ -ZKPr implies $2 ε$ -DP for any $Agg$ , while $ε$ -DP implies $(Agg, ε)$ -ZKPr if the identity function is in $Agg$ . This is yet another way to formalize the intuition that differential privacy protects against attackers who have full background knowledge.

Multidimensional definitions

Using a simulator allows making statements of the type “this mechanism does not leak more information on a given target than a certain class of aggregate metrics”. Similarly to noiseless privacy, it is possible to explicitly limit the attacker’s background knowledge using a data-generating probability distribution, as well as vary the neighborhood definitions to protect other types of information than the presence and characteristics of individuals. This is done in [36] as coupled-worlds privacy, a generalization of distributional DP, where a family of functions $priv$ represents the protected attribute.

Definition 44 ( $(Θ, Γ, ε)$ -coupled-worlds privacy [36]). Let $Γ$ be a family of pairs of functions $(agg, priv)$ . A mechanism $M$ satisfies $(Θ, Γ, ε)$ -coupled-worlds privacy (CWPr) if there is a simulator $Sim$ such that for all distributions $θ \in Θ$ , all $(agg, priv) \in Γ$ , and all possible values $p$ :

M {(D)}_{| D \sim θ, priv (D) = p} \approx_{ε} Sim {(agg (D))}_{| D \sim θ, priv (D) = p}

A special case of coupled-worlds privacy is also introduced in [36] as distributional DP, already mentioned in Section 2.2.5: each function $priv$ captures the value of a single record, and the corresponding function $agg$ outputs all other records.

Coupled-worlds privacy is a good example of combining variants from different dimensions: it changes several aspects of the original definition according to from N, B and R. Moreover, Q and F can easily be integrated with this definition by using $(ε, δ)$ -indistinguishability with a Bayesian reformulation. This is done explicitly in inference-based coupled-worlds privacy [36]; the same paper also introduces inference-based distributional differential privacy (IBDDP).

Definition 45 ( $(Θ, Γ, ε, δ)$ -inference-based coupled-worlds privacy [36]). Given a family $Θ$ of probability distributions on $D \times B$ , and a family $Γ$ of pairs of functions $(agg, priv)$ , a mechanism $M$ satisfies $(Θ, Γ, ε, δ)$ -inference-based coupled-worlds privacy (IBCWPr) if there is a simulator $Sim$ such that for all distributions $θ \in Θ$ , and all $(agg, priv) \in Γ$ :

{priv (D)}_{| (D, B) \sim θ, M (D) = O, B = ˆ B} \approx_{(ε, δ)} {priv (D)}_{| (D, B) \sim θ, Sim (agg (D)) = O, B = ˆ B}

with probability at least $1 - δ$ over the choice of $O$ and $ˆ B$ .

Random DP was also combined with an idea similar to ZKPr: in [35], the authors introduce typical stability, which combines random DP with approximate DP, except that rather using $(ε, δ)$ -indistinguishability between two outputs of the mechanism, it uses a simulator that only knows data-generating distribution.

Definition 46 ( $(Θ, γ, ε, δ)$ -typical stability [35]). Given a family $Θ$ of probability distributions on $D$ , a mechanism $M$ satisfies $(Θ, γ, ε, δ)$ -typical stability (TypSt) if for all distributions $θ \in Θ$ , there is a simulator $Sim$ such that with probability at least $1 - γ$ over the choice of $D \sim θ$ , $M (D) \approx_{ε, δ} Sim (θ)$ .

In the same paper, the authors introduce a variant of the same definition with the same name, which compares two outputs of the mechanism; this is essentially a combination between DlPr^{[53, 333]} and approximate DP.

We did not find any evidence that the variants and extensions of this section are used outside of theoretical papers exploring the guarantees they provide.

LINKPREV LINKUP LINKNEXT