Lowering the cost of anonymization

a PhD thesis

Notations

Mathematical notations

The set of nonnegative integers
The set of integers
The set of real numbers
Cardinality of set
Set difference of set and set
Cartesian product between set and set
Probability of event
Probability of the conjunction of event and event B
A probability distribution (or , , …).
Probability of event , when is sampled from
Probability of event , conditioned on event
Expected value of random variable
Expected value of , when is sampled from
Probability distribution conditioned on event
Random variable , when is sampled from
Function, possibly probabilistic, from set to set
Notations for datasets and records

Set of possible records
A record (or , , …)
Set of possible datasets (sequences of records)
A dataset (or , , , …)
The number of records in dataset (typically denoted )
The -th record of dataset ()
Dataset with its -th record removed

Some of these notations are slightly modified when we introduce the concept of record ownership in Chapter 4 , see the “Datasets with user identifiers” notation table further down.

Differential privacy and its variants

Set of possible outputs of a privacy mechanism
An output of the privacy mechanism
A subset of outputs of a privacy mechanism
A privacy mechanism
-indistinguishability (see Definition 5 )
-indistinguishability (see Definition 14 )
Privacy loss random variable (PLRV) between and ,
often abbreviated (see Definition 13 )
Set of possible partial knowledges
A value of the partial knowledge (or , , , …)
Family of probability distributions on or
A probability distribution on or , also used as an abbreviation
for or , e.g. in or
A probability distribution on
A specific dataset, also used as an abbreviation for the event
A specific value of the partial knowledge, also used as an abbreviation
for the event
A function from to called a simulator (see Definition 50 )
Dataset with its -th record replaced by
A normalization of a distribution (see Definition 53 )
, , … Parameters of a normalization (see Definition 53 )
Privacy loss random variable (PLRV) of an output given partial knowledge ,
when distinguishing between or (see Definition 56 )
Abbreviation for
Abbreviation for
Abbreviation for
Bijective mapping between and for record (see Definition 59 )
Dependency of on to distinguish and (see Definition 62 )
Cardinality estimators

A set of records (or , , …)
Set of possible sketches
A sketch (or , , , …)
Set of probability distributions over
A probability distribution over sketches (or )
The empty sketch
Sketch (or distribution of sketches), obtained by adding record
to a sketch (or distribution of sketches)
Estimated number of distinct records added to a sketch
Sketch obtained by adding all records of set into
Distribution of sketches obtained by adding records of set to
Sketch obtained by merging sketches and
Distribution of sketches obtained by merging distributions of sketches
and (also denoted )
Set of subsets of of cardinality
Probability of event , when is taken uniformly in
Datasets with user identifiers

A dataset column (or , , …)
The value of a column (or , , …)
Set of possible user identifiers
A user identifier (or , , …)
The set of user identifiers associated with a column value
A hash function
Set of possible records, composed of multiple columns
A row, associating a user identifier with a record
A dataset, formally a multiset of rows
Row-level distance between datasets and (see Definition 78 )
User-level distance between datasets and (see Definition 78 )
A vector-valued function
norm of vector
-sensitivity of function (see Definition 80 )
User-level -sensitivity of function (see Definition 80 )
All opinions here are my own, not my employers.
I'm always glad to get feedback! If you'd like to contact me, please do so via e-mail (se.niatnofsed@neimad) or Twitter (@TedOnPrivacy).