Lowering the cost of anonymization

a PhD thesis

2.2.8  Computational power (C)

The -indistinguishability property in DP is information-theoretic: the attacker is implicitly assumed to have infinite computing power. This is unrealistic in practice, so it is natural to consider definitions where the attacker only has polynomial computing power. Changing this assumption leads to weaker data privacy definitions. In [285], two approaches have been proposed to formalize this idea: either modeling the distinguisher explicitly as a polynomial Turing machine, either allow a mechanism not to be technically differentially private, as long as one cannot distinguish it from a truly differentially private one.

Using a polynomial distinguisher on the output

The attacker is not explicit in the formalization of DP based on -indistinguishability. It is possible change the definition to make this attacker explicit: model it as a distinguisher, who tries to guess whether a given output comes from a dataset or its neighbor . In doing so, it becomes straightforward to require this attacker to be computationally bounded: simply model it as a probabilistic polynomial-time Turing machine. In [285], the authors introduce IndCDP, short for indistinguishability-based computational differential privacy.

Definition 47 (-IndCDP [285]). A family of privacy mechanisms provides -IndCDP if there exists a negligible function such that for all non-uniform probabilistic polynomial-time Turing machines (the distinguisher), all polynomials , all sufficiently large , and all datasets of size at most that differ only one one record:

where is a function that converges to zero asymptotically faster than the reciprocal of any polynomial.

This definition can also be expressed using the authors define differential indistinguishability, a notion defined in [24] that adapts -indistinguishability to a polynomially bounded attacker.

Using a polynomial distinguisher on the mechanism

Another natural option is to require that the mechanism “looks like” a truly differentially private mechanism, at least to a computationally bounded distinguisher. In [285], the authors introduce SimCDP, short for simulation-based computational differential privacy.

Definition 48 (-SimCDP [285]). A family of privacy mechanisms provides -SimCDP if there exists a family of -DP and a negligible function such that for all non-uniform probabilistic polynomial-time Turing machines , all polynomials , all sufficiently large , and all datasets of size at most :

where is a function that converges to zero asymptotically faster than the reciprocal of any polynomial.

In [285], the authors show that -SimCDP implies -IndCDP.

Multidimensional definitions

IndCDP has been adapted to different settings, and extended to arbitrary neighborhood relationships. In output constrained DP (OCDP), introduced in [194], the setting is a two-party computation, each party can have its own set of privacy parameters (, , , and —the parameters correspond to the term in IndCDP), and the neighborhood relationship is determined by a function . The authors also propose DP for Record Linkage (DPRL), an instance of OCDP that uses for a specific function that captures the need to protect non-matching records during the execution of a private record linkage protocol.

Some DP variants which explicitly model an adversary with a simulator can relatively easily be adapted to model a computationally bounded adversary, simply by imposing that the simulator must be polynomial. This is done explicitly in [161], where the authors define computational zero-knowledge privacy (CZKPr), which could also be adapted to e.g., the two coupled-worlds privacy definitions as well.

Further, although we have not seen this done in practice in existing literature, the idea behind SimCDP can in principle be adapted to any other definition: rather than requiring that a given definition holds in an information-theoretic fashion, it should be possible to require that the mechanism “looks like” a mechanism which genuinely satisfies the definition.

Limiting the computational power of the attacker is a reasonable assumption, but for a large class of queries, it cannot provide significant benefits over classical DP in the typical client-server setting [177]. Thus, existing work using it focuses on multi-party settings [42].

All opinions here are my own, not my employers.
I'm always glad to get feedback! If you'd like to contact me, please do so via e-mail (se.niatnofsed@neimad) or Twitter (@TedOnPrivacy).