Lowering the cost of anonymization

a PhD thesis

2.2.7  Relativization of the knowledge gain (R)

A differentially private mechanism does not reveal more than a bounded amount of probabilistic information about a user. This view does not explicitly take into account other ways information can leak, like side-channel functions or knowledge about the structure of a social network. We found two approaches that attempt to include such auxiliary functions in DP variants. One possibility is to weaken DP by allowing a certain amount of leakage; another option is to explicitly forbid the mechanism to reveal more than another function, considered to be safe for release.

Taking into account an auxiliary leakage function

In [ 257], authors define bounded leakage DP, which quantifies the privacy that is maintained by a mechanism despite bounded, additional leakage of information by some leakage function. Interestingly, this leakage function shares the randomness of the privacy mechanism: it can, for example, capture side-channel leakage from the mechanism’s execution. In the formal definition of this DP variant, the randomness is explicit: the privacy mechanism and the leakage takes the random bits as an additional parameter.

Definition 42 (-bounded leakage differential privacy [257]). Let be a leakage function. A privacy mechanism is -bounded leakage differentially private (BLDP) if for all pairs of neighboring datasets and , all outputs of such that and , and all sets of outputs :

where the randomness is taken over the random bits .

As expected, if there is no leakage ( is a constant function), this is simply -DP. The authors also show that it is closed for post-processing and composable. Furthermore, if the privacy mechanism is independent from the leakage function, it is strictly weaker than differential privacy.

Borrowing concepts from zero-knowledge proofs

When using the associative interpretation with the independence assumption, it is unclear how to adapt DP to correlated datasets like social networks: data about someone’s friends might reveal sensitive information about this person. The causal interpretation of DP does not suffer from this problem, but how to adapt the associative view to such correlated contexts? Changing the definition of the neighborhood is one possibility (see Section 2.2.3.0 ), but it requires knowing in advance the exact impact of someone on other records. A more robust option is to impose that the information released does not contain more information than the result of some predefined algorithms on the data, without the individual in question. The method for formalizing this intuition borrows ideas from zero-knowledge proofs [ 170].

Instead of imposing that the result of the mechanism is roughly the same on neighboring datasets and , the intuition is to impose that the result of the mechanism on can be simulated using only some information about . The corresponding definition, called zero-knowledge privacy and introduced in [ 161], captures the idea that the mechanism does not leak more information on a given target than a certain class of aggregate metrics. This class, called model of aggregate information in [161], is formalized by a family of (possibly randomized) family of algorithms .

Definition 43 (-zero-knowledge privacy [161]). Let be a family of (possibly randomized) algorithms . A privacy mechanism is -zero-knowledge private (ZKPr) if there exists an algorithm and a simulator such as for all datasets and indices , .

In [161], authors show that -ZKPr implies -DP for any , while -DP implies -ZKPr if the identity function is in . This is yet another way to formalize the intuition that differential privacy protects against attackers who have full background knowledge.

Multidimensional definitions

Using a simulator allows making statements of the type “this mechanism does not leak more information on a given target than a certain class of aggregate metrics”. Similarly to noiseless privacy, it is possible to explicitly limit the attacker’s background knowledge using a data-generating probability distribution, as well as vary the neighborhood definitions to protect other types of information than the presence and characteristics of individuals. This is done in [ 36] as coupled-worlds privacy, a generalization of distributional DP, where a family of functions represents the protected attribute.

Definition 44 (-coupled-worlds privacy [36]). Let be a family of pairs of functions . A mechanism satisfies -coupled-worlds privacy (CWPr) if there is a simulator such that for all distributions , all , and all possible values :

A special case of coupled-worlds privacy is also introduced in [36] as distributional DP, already mentioned in Section 2.2.5 : each function captures the value of a single record, and the corresponding function outputs all other records.

Coupled-worlds privacy is a good example of combining variants from different dimensions: it changes several aspects of the original definition according to from N, B and R. Moreover, Q and F can easily be integrated with this definition by using -indistinguishability with a Bayesian reformulation. This is done explicitly in inference-based coupled-worlds privacy [ 36]; the same paper also introduces inference-based distributional differential privacy (IBDDP).

Definition 45 (-inference-based coupled-worlds privacy [36]). Given a family of probability distributions on , and a family of pairs of functions , a mechanism satisfies -inference-based coupled-worlds privacy (IBCWPr) if there is a simulator such that for all distributions , and all :

with probability at least over the choice of and .

Random DP was also combined with an idea similar to ZKPr: in [ 35], the authors introduce typical stability, which combines random DP with approximate DP, except that rather using -indistinguishability between two outputs of the mechanism, it uses a simulator that only knows data-generating distribution.

Definition 46 (-typical stability [35]). Given a family of probability distributions on , a mechanism satisfies -typical stability (TypSt) if for all distributions , there is a simulator such that with probability at least over the choice of , .

In the same paper, the authors introduce a variant of the same definition with the same name, which compares two outputs of the mechanism; this is essentially a combination between DlPr[ 53 , 333] and approximate DP.

We did not find any evidence that the variants and extensions of this section are used outside of theoretical papers exploring the guarantees they provide.

All opinions here are my own, not my employers.
I'm always glad to get feedback! If you'd like to contact me, please do so via e-mail (se.niatnofsed@neimad) or Twitter (@TedOnPrivacy).