Lowering the cost of anonymization

a PhD thesis

2.3  Conclusion

In this chapter, we first listed four syntactic privacy definitions; each one a different attempt at defining anonymization based on properties of the dataset. These definitions demonstrated the various attack models that are relevant in different contexts, and the difficulty of finding a satisfying notion of anonymity. We then introduced differential privacy, a definition which changes the conceptual perspective: rather than seeing anonymization as a property of a database, it considers it as a property of the process. Thanks to this view, we can now make strong statements about the information that an attacker can get about an individual, bypassing the need for error-prone attack modeling. Differential property also comes with useful properties, such as post-processing and composition, which makes it easier for researchers and engineers to build mechanisms that satisfy it.

Differential privacy itself has been extremely successful, and the intuition behind it inspired hundreds of alternative definitions. But this also has created a situation where a newcomer to the field of anonymization can have trouble navigating it, understanding how different notions relate to each other, and choosing which notion is most appropriate for their use case. To solve this issue, we proposed a classification of DP variants and extensions using the concept of dimensions. When possible, we compared definitions from the same dimension, and we showed that definitions from the different dimensions can be combined to form new, meaningful definitions. In theory, it means that even if there were only three possible ways to change a dimension (e.g., making it weaker or stronger), this would result in possible definitions: the existing definitions shown in Figure 2.7 are only scratching the surface of the space of possible notions. Using these dimensions, we unified and simplified the different notions proposed in the literature. We highlighted their properties such as composability and whether they satisfy the privacy axioms by either collecting the existing results or creating new proofs, and whenever possible, we showed their relative relations to one another. We hope that this work will make the field of data privacy more organized and easier to navigate, especially for new practitioners.

All opinions here are my own, not my employers.
I'm always glad to get feedback! If you'd like to contact me, please do so via e-mail (se.niatnofsed@neimad) or Twitter (@TedOnPrivacy).