Lowering the cost of anonymization

a PhD thesis

2.2.9  Summarizing table

In this section we summarize the results from the previous 7 sections into a table where we list the known relations and show the properties with either referring to the original proof or creating a novel one.

In Table 2.20 , we list the differential privacy variants and extensions introduced in this work. For each, we specify their name, parameters and where they were introduced (column 1), which dimensions they belong to (column 2), which axioms they satisfy (column 3, post-processing on the left and convexity on the right), whether they are composable (column 4) and how they relate to other differential privacy notions (column 5). We do not list definitions whose only difference is that they apply DP to other types of input, like those from Section 2.2.3.0 , or geolocation-specific definitions.

The references for each claim present in Table 2.20 are listed after the table, while the proofs follow in Section 2.2.9.0 . Like in the rest of this chapter, the following abbreviations are used for dimensions:

  • Q: Quantification of privacy loss
  • N: Neighborhood definition
  • V: Variation of privacy loss
  • B: Background knowledge
  • F: Formalism of privacy loss
  • R: Relativization of knowledge gain
  • C: Computational power

Name & references Dimensions P.P.1 Cv.2 Cp.3 Relations
-DP, or -approximate DP [ 126]
also known as max-KL stability [ 37]
Q 5 5 11 -DP  -DP
-probabilistic DP [ 267 , 277]
also known as -DP in distribution [ 60]
Q 4 6 11 -DP  -ProDP  -DP
-Relaxed DP [ 407] Q 4 6 11 -ProDP  -RelDP  -DP
-Kullback-Leiber Pr [ 31 , 88] Q 5 5 11 -DP  -KLPr  -DP
-Rényi DP [ 284] Q 5 5 11 -KLPr  -RenyiDP  -DP
binary- DP [ 380] Q ? ? ? b- DP  -DP
tenary- DP [380] Q ? ? ? t- DP  -DP
-total variation Pr [31] Q ? ? ? -TVPr  b- DP
-quantum DP [ 81] Q ? ? ?
-mutual-information DP [88] Q 5 5 11 -DP  -MIDP  -KLPr
-mean concentrated DP [ 132] Q 4 ? 11 -DP  -mCoDP  -DP
-zero concentrated DP [ 57] Q 5 5 11 -zCoDP  -mCoDP
-approximate CoDP [57] Q 5 ? 11 -DP  -ACoDP  -zCoDP
-bounded CoDP [57] Q 5 5 11 -bCoDP  -zCoDP
-truncated CoDP [81] Q 5 5 11 -tCoDP[81]  -DP
-truncated CoDP [ 56] Q 5 5 11 -tCoDP[56]  -bCoDP
-divergence DP [31] Q 5 5 ? -DivDP  most definitions in Q
-divergence DP [ 118] Q 5 5 ? -DivDP  -DivDP
-capacity bounded DP [ 69] Q 5 5 ? -CBDP  -DivDP
-unbounded DP [ 227] N 4 4 12 -DP  -uBoDP  -GrDP
-bounded DP [227]
also known as per-person DP [ 149]
N 4 4 12 -BoDP  -DP
-attribute/bit DP [227] N 4 4 12 -BitDP  -AttDP  -BoDP
-element DP [ 22] N 4 4 12 -ELDP  -DP
-one-sided DP [ 231] N 4 4 12 -OnSDP  -BoDP
-sensitive privacy [ 23] N 4 4 12 -SenPr  --OnSDP
-anomaly-restricted DP [ 51] N 4 4 12 -ARDP  -DP
-group DP [ 123]
also known as DP under correlation [ 73]
N 4 4 12 -GrDP  -DP
-dependent DP [ 258] N 4 4 12 -DepDP  -GrDP
-bayesian DP [ 399] N 4 4 12 -BayDP[399]  -DepDP
-correlated DP [ 392 , 393] N 4 4 12 -CorDP  -BayDP[399]
-prior DP [ 256] N 4 4 12 -PriDP  -BayDP[399]
-free lunch Pr [227] N 4 4 12 -FLPr  all definitions in N
-individual DP [ 349]
also known as conditioned DP [ 64]
N 4 4 12 -IndDP  -DP
-per-instance DP [ 379] N 4 4 12 -PIDP  -IndDP
-generic DP [ 145, 227] N 4 4 12 -GcDP[227]  most definitions in N
-constrained DP [ 409]
also known as adjacent DP [ 235]
and DP under a neighborhood [145]
N 4 4 12 -ConsDP  -GcDP
-distributional Pr [409] N 4 4 12 -DlPr[409]  -GcDP
-sensitivity-induced DP [ 335] N 4 4 12 -SIDP  -GcDP
-induced-neighbors DP [227] N 4 4 12 -INDP  -GcDP
-blowfish Pr [ 187 , 193] N 4 4 12 -BFPr  -GcDP  -INDP
-adjacency-relation div. DP [ 218] Q,N 5, 4 5, 4 ? -GcDP[227]  -ARDDP  -DivDP
-personalized DP [ 134 , 165 , 212 , 261 , 302]
also known as heterogeneous DP [ 10]
V 7 7 12 -PerDP  -DP
-tailored DP [ 263] V 7 7 12 -TaiDP  -PerDP
-outlier Pr [263] V 7 7 12 -OutPr  -TaiDP
-simple-outlier Pr [263] V 7 7 12 -SOPr  -OutPr
-simple outlier DP [263] V 7 7 12 -DP  -SODP  -OutPr
-staircase outlier DP [263] V 7 7 12 -SCODP  -OutPr
-Pareto DP [263] V 7 7 12 -ParDP  -TaiDP
-random DP [ 185] V ? 8 11 -RanDP  -DP
-predictive DP [ 184]
also known as model-specific DP [ 270]
V ? ? ? -RanDP  -PredDP  -DP
-generalized DP [ 111] V ? ? ? -GdDP  -DP
-Pr [ 67]
also known as extended DP [ 219]
N,V 7 7 12 -DP  -Pr
-weighted DP [ 322] N,V 7 7 12 -WeiDP  -Pr
-smooth DP [31] N,V 7 7 12 -SmoDP  -Pr
-earth mover’s Pr [ 151] N,V 7 7 12 -EMDP  -Pr
-DP on location set [ 396] N,V 7 7 12 -LocSetDP
-distributional Pr [ 333, 409] N,V ? ? ? -FLPr  -DlPr[ 53, 333]
-endogenous DP [ 234] Q,V 7 7 12 -DP  -EndDP  -PerDP
-weak Bayesian DP [ 367] Q,V 4 ? 12 -DP  -WBDP  -RanDP
-on average KL Pr [ 150 , 381]
also known as average leave-one-out KL stability [150]
Q,V 5,4 ? ?16 -KLPr  -avgKLPr  -RanDP
-Bayesian DP [367] Q,V 4 8 12 -ProDP  -BayDP[367]  -RanDP
-privacy at risk [ 92] Q,V 4 8 12 -ProDP  -PAR  -RanDP
-pseudo-metric DP [ 106] Q,N,V ? ? 11 -DP  -PsDP  -Pr
-extended divergence DP [218] Q,N,V 7 7 ? -Pr  -EDivDP  -Div DP
-generic DP [ 225] Q,N,V 4 4 ? -GcDP[225]  -DP
-abstract DP [225] Q,N,V 4 4 ? -AbsDP  -GcDP[225]
-noiseless Pr [ 46 , 116 , 164 , 391] B 4 4 13 -DP  -NPr
-causal DP [ 100] B 4 4 13 -CausDP  -DP
-DP under sampling [ 255] Q,B 4 4 13 -NPr  -SamDP  -DP
-active PK DP [ 36, 46, 100] Q,B 4 4 13 -APKDP  -NPr
-passive PK DP [100] Q,B 4 4 13 -APKDP  -PPKDP  -NPr
-pufferfish Pr [ 206, 219 , 228] N,B 4 4 13 -NPr  -PFPr  -GcDP
-Bayesian DP [ 248] N,B 4 4 13 BayDP[248]  -PFPr
-distribution Pr [219] Q,N,B 7 7 13 -DnPr  -APKDP
-profile-based DP [ 162] Q,N,B 7 7 13 -PBDP  -DnPr
-probabilistic DnPr [219] Q,N,B 4 6 13 -ProDP  -PDnPr  -DnPr
-divergence DnPr [218] Q,N,B 7 7 13 -DP  -DDnPr  -DnPr
-extended DnPr [219] N,V,B 7 7 13 -Pr  -EDnPr  -DnPr
-ext. div. DnPr [218] Q,N,V,B 7 7 13 -DDPr  -EDDnPr  -EDnPr
-indistinguishable Pr [ 260] F 14 14 14 -IndPr  -DP
-DP [ 113] Q,F 4 ? 4 -DP  -DP
Gaussian DP [113] Q,F 4 ? 4 GaussDP  -Pr
-positive membership Pr [ 254] B,F 9 9 13 -PMPr  -BoDP
-negative membership Pr [254] B,F 9 9 13 -NMPr  -BoDP
-membership Pr [254] B,F 9 9 13 -PMPr  -MPr  -NMPr
-adversarial Pr [ 326, 391]
also known as information privacy [391]
Q,B,F 9 9 13 -DP  -AdvPr  -PMPr
-aposteriori noiseless Pr [46] B,F 9 9 ? -ANPr  -NPr
-semantic Pr [ 158 , 217] F ? ? ? -SemPr  -DP
-range-bounded Pr [ 121] F ? ? ? -RBPr  -DP
-inference-based causal DP [100] B,F ? ? ? -IBCDP  -CausDP
-information Pr [ 119] N,B,F ? ? ? -InfPr  -DP
-zero-knowledge Pr [ 161] R 4 4 ?16 -ZKPr  -DP
-bounded leakage DP [ 257] Q,R 4 4 11 -BLDP  -DP
-coupled-worlds Pr [36] N,B,R 4 4 7 -CWPr  -DP
-distributional DP [36] N,B,R 4 4 7 -CWPr  -DistDP  -DP
-inference-based CW Pr [36] Q,N,B,F,R ? ? 7 -IBCWPr  -CWPr
-inference-based DistDP [36] Q,N,B,F,R ? ? 7 -DDP  -IBDDP  -IBCWPr
-typical stability [ 35] Q,V,R 4 ? 11
-SIM-computational DP [ 285] C 10 10 17 -SimCDP  -DP
-IND-computational DP [285] C 10 10 17 -IndCDP  -SimCDP
-DP for Record Linkage [ 194] C 10 10 17 -RLDP  -OCDP
-output constrained DP [194] N,C 10 10 17 -OCDP  -IndCDP
-computational ZK Pr [161] R,C 10 10 ? -CZKPr  -ZKPr

1 Post-processing

2 Convexity

3 Composition

4 See Proposition 11 .

5 See Proposition 12 .

6 See Proposition 13 .

7 See Proposition 14 .

8 See Proposition 15 .

9 See Proposition 16 .

10 See Proposition 17 .

11 See Proposition 18 .

12 See Proposition 19 .

13 See Proposition 20 .

14 Follows directly from its equivalence to -DP.

15 A modified definition was presented in [228], which is an instance of PF Pr.

16 A proof for a restricted scenario appears in the paper introducing the definition.

17 This claim appears in [284], its proof is in the unpublished full version.

Table 2.20: Summary of variants/extensions of DP representing the main options in each combination of dimensions.

Proofs of properties

We first list known results for variants and extensions satisfying privacy axioms, prove additional results, then we do the same for composition.

Axioms

Proposition 11.

1.
ProDP, ACoDP, and mCoDP do not satisfy the post-processing axiom [ 57 , 277].
2.
AbsDP satisfies neither privacy axiom, while GlDP satisfies both [ 225 , 228]22.
3.
WBDP satisfies the post-processing axiom [ 367].
4.
TypSt satisfies the post-processing axiom [ 35].
5.
GaussDP satisfies the post-processing axiom [ 113].
6.
PFPr satisfies both privacy axioms [228].
7.
CWPr satisfies both privacy axioms 23 [ 36].
8.
APKDP and PPKDP satisfy both privacy axioms [ 100].
9.
BLDP satisfies both privacy axioms [ 257].

Proposition 12. All instantiations of DivDP satisfy both privacy axioms. In particular, approximate DP, MIDP, KLPr, RenDP, and zCoDP satisfy both axioms.

Proof. The post-processing axiom follows directly from the monotonicity property of the -divergence. The convexity axiom follows directly from the joint convexity property of the -divergence. □

Proposition 13. ProDP and ACoDP do not satisfy the convexity axiom.

Proof. Consider the following mechanisms and , with input and output in .

  • , with probability , and with probability .
  • .

Both mechanisms are -ProDP. Now, consider the mechanism which applies with probability and with probability . is a convex combination of and , but the reader can verify that it is not -ProDP. The result for -ACoDP is a direct corollary, since is is equivalent to -ProDP when . □

Proposition 14. -Pr satisfies both privacy axioms. Further, EDivDP also satisfies both privacy axioms.

Proof. The proof of Proposition 11 for PFPr (Appendix B in [ 228]) is a proof by case analysis on every possible protected property. The fact that is the same for every protected property has no influence on the proof, so we can directly adapt the proof to -Pr, and its combination with PFPr. Similarly, the proof can be extended to arbitrary divergence functions, like in Proposition 12 . □

Proposition 15. RanDP does not satisfy the convexity axiom.

Proof. Let be the uniform distribution on , let be generated by picking records according to , and by flipping one record at random. Let return if all records are , and otherwise. Let return if all records are , and otherwise.

Note that both mechanisms are -RanDP. Indeed, will only return for with probability , and for with probability (if only has one , which happens with probability , and this record is flipped, which happens with probability ). In both cases, will return for the other database; which will be a distinguishing event. Otherwise, will return for both databases, so . The reasoning is the same for .

However, the mechanism obtained by applying either or uniformly randomly does not satisfy -RanDP: the indistinguishability property does not hold if or have all their records set to either or , which happens twice as often as either option alone. □

Proposition 16. All variants of MPr, AdvPr, and ANPr satisfy both axioms. As a direct corollary, InfPr also satisfies both axioms.

Proof. We prove it for AdvPr. A mechanism satisfies -AdvPr if for all , , and , . We first prove that it satisfies the convexity axiom. Suppose is a convex combination of and . Simplifying into , we have:

Denoting for , this gives:

The proof for the post-processing axiom is similar, summing over all possible outputs . It is straightforward to adapt the proof to all other definitions which change the shape of the prior-posterior bounds. □

Proposition 17. Both versions of CDP satisfy both privacy axioms; where the post-processing axiom is modified to only allow post-processing with functions computable on a probabilistic polynomial time Turing machine. As a direct corollary of Proposition 11 for CWPr, CZKPr also satisfies both privacy axioms.

Proof. For Ind-CDP and the post-processing axiom, the proof is straightforward: if post-processing the output could break the -indistinguishability property, the attacker could do this on the original output and break the -indistinguishability property of the original definition.

For Ind-CDP and the convexity axiom, without loss of generality, we can assume that the sets of possible outputs of both mechanisms are disjoint (otherwise, this give strictly less information to the attacker). The proof is then the same as for the post-processing axiom.

For SimCDP, applying the same post-processing function to the “true” differentially private mechanism immediately leads to the result, since DP satisfies post-processing. The same reasoning holds for convexity. □

Composition

In this section, if and are two mechanisms, we denote the mechanism defined by .

Proposition 18 (Existing results). If and are respectively…

1.
-DP and -DP, then is -DP [ 126].
2.
-ProDP and -ProDP, then is -ProDP [ 60 , 267 , 277].
3.
-RelDP and -RelDP, then is -RelDP [ 407].
4.
-MIDP and -MIDP, then is -MIDP [ 88].
5.
-KLDP and -KLDP, then is -KLDP [ 31, 88].
6.
-RenDP and -RenDP, then is -RenyiDP [ 284].
7.
-mCoDP and -mCoDP, then is -mCoDP [ 57 , 132].
8.
-zCoDP and -zCoDP, then is -zCoDP [57].
9.
-ACoDP and -ACoDP, then is -ACoDP [57].
10.
-bCoDP and -bCoDP, then is -bCoDP [57].
11.
-CCoDP and -CCoDP, then is -CCoDP [ 81].
12.
-DP and -DP, than then is -DP [ 113].
13.
-RanDP and -RanDP, then is -RanDP [ 184].
14.
-BayDP[ 367] and -BayDP[367], then is -BayDP[367]. The same result holds for WBDP as an immediate consequence of Theorem 1 in [367].
15.
-TypSt and -TypSt, then is -TypSt [ 35].
16.
-PsDP and -PsDP then is -PsM DP [ 106].
17.
-BLDP and -BLDP, then is -BLDP, where is the concatenation of and (where the randomness is not shared between and , nor between and ) [ 257].

Proposition 19. If is -private and is -private, then is -private, where .

Proof. The proof is essentially the same as for -DP. ’s randomness is independent from ’s, so:

Each definition listed in Proposition 18 can also be combined with -privacy, and the composition proofs can be similarly adapted. □

Proposition 20. In general, definitions which assume limited background knowledge from the adversary do not compose.

Proof. The proof of Proposition 19 cannot be adapted to a context in which the attacker has limited background knowledge: as the randomness partially comes from the data-generating distribution, the two probabilities are no longer independent. A typical example considers two mechanisms which answer e.g., queries “how many records satisfy property ” and “how many records satisfy property and have an ID different from 4217”: the randomness in the data might make each query private, but the combination of two queries trivially reveals something about a particular user. Variants of this proof can easily be obtained for all definitions with limited background knowledge. □

22As an immediate corollary, all definitions which combine notions in N and NPr also satisfy both axioms.

23As an immediate corollary, DistDP and ZKPr also satisfy both axioms.

All opinions here are my own, not my employers.
I'm always glad to get feedback! If you'd like to contact me, please do so via e-mail (se.niatnofsed@neimad) or Twitter (@TedOnPrivacy).