A glossary of differential privacy terms
This is the first blog post in a series about differential privacy. Check out the table of contents to see the next articles!
Differential privacy has developed quite a sprawling zoo of new terms over the years… and a fair share of idiosyncratic uses of common terminology. This is an attempt to list the most common ones, with concise explanations and links to further reading. I try to favor friendlier references like blog posts or lecture notes whenever available.
- Above Threshold
- Adaptive composition
- Add-or-remove
- Amplification
- Approximate differential privacy
- Binary tree mechanism
- Bounded differential privacy
- Central differential privacy
- Clamping
- Clipping
- Composition
- Continual release
- \(\delta\) ("Delta")
- Differential privacy ("DP")
- Distributed differential privacy
- DP-SGD
- \(\varepsilon\) ("Epsilon")
- \(\varepsilon\)-differential privacy
- \((\varepsilon,\delta)\)-differential privacy
- Event-level
- Exponential mechanism
- \(f\)-DP
- Fully adaptive composition
- Gaussian differential privacy
- Gaussian mechanism
- Global differential privacy
- Global sensitivity
- Hockey stick divergence
- Item-level differential privacy
- \(L_1\)-sensitivity
- \(L_2\)-sensitivity
- Laplace mechanism
- Local differential privacy
- Local sensitivity
- Mechanism
- Neighboring relation
- Non-private
- Noise
- Pan-privacy
- PATE
- Post-processing
- Privacy accounting
- Privacy budget
- Privacy filter
- Privacy loss
- Privacy loss distribution ("PLD")
- Privacy odometer
- Private
- Private selection
- Public data
- Pure differential privacy
- Randomized response
- Replace-one
- Report Noisy Max
- Rényi differential privacy ("Rényi DP")
- Shuffling
- Smooth sensitivity
- Sensitivity
- Sparse vector technique ("SVT")
- Streaming
- Top-\(k\) selection
- Truncated distributions
- Unbounded differential privacy
- User-level differential privacy
- Variant
- Zero-concentrated DP ("zCDP")
- Zero-out
Above Threshold
Adaptive composition
A kind of composition where each mechanism takes the output of previously run mechanisms as input. This blog post has a simple explanation of this setting and how it differs from other kinds of composition.
Add-or-remove
A neighboring relation where the two databases differ by adding or removing a single record. Also called add-or-remove one record, or unbounded DP.
Amplification
An operation that modifies a DP mechanism in some way, leading to improved privacy guarantees. The term is used for multiple results, for different kinds of operations.
- Amplification by subsampling (or simply amplification by sampling) means that when taking a random sample of the input data before passing it to a DP mechanism, we obtain stronger guarantees than when using all the data (or a non-random subset). An overview of results can be found in this book chapter.
- Amplification by shuffling means that after running a local DP mechanism on each input record, randomly reordering the noisy records provides stronger guarantees. This paper introduced this idea; this blog post presents more recent improvements.
- Amplification by iteration means that under some conditions, when running an adaptive sequence of DP mechanisms, releasing only the output of the last mechanism provides a better guarantee than releasing all intermediary outputs. This was introduced in this paper.
Also called privacy amplification.
Approximate differential privacy
A variant of pure differential privacy that allows for a non-zero probability that the privacy loss is larger than \(\varepsilon\). Also often called \((\varepsilon,\delta)\)-DP. This blog post has an introduction to the definition, and this one gives a more precise characterization.
Binary tree mechanism
A mechanism to maintain an increasing counter in a streaming setting; it is used as a building block for more complex mechanisms in this setting. An introduction to this can be found in this blog post.
Bounded differential privacy
See replace-one.
Central differential privacy
A setting in which a central entity holds the data of every individual in an input dataset, and then runs a DP mechanism on this data. This makes it possible to add relatively little noise for a given privacy budget, but requires trusting this central entity. This blog post presents this in more details.
Also sometimes called global differential privacy.
Clamping
A simple technique to bound the sensitivity of numeric functions like the sum or average of numeric values. Given clamping bounds \([low,high]\), the operation consists in changing all individual values below \(low\) to \(low\) and all values above \(high\) to \(high\). A few examples can be found in this blog post.
Also called clipping (and the bounds clipping bounds).
Clipping
See clamping.
Composition
Composing two DP mechanisms is to run them both on the sensitive input data, and return the output of both of them.
The composition property states that the composition of two DP mechanisms is also DP. Composition theorems provide a formula to compute the privacy parameters of the composition, depending on the original parameters. This is useful to build complex mechanisms from simple building blocks, or to quantify the privacy guarantees of multiple DP releases. This blog post gives the statement and proof of the original and simplest composition theorem for pure DP.
There can be multiple ways to combine two mechanisms, and so there are many different kinds of composition. This blog post outlines a number of them.
Continual release
A setting in which a server is continuously receiving new data in a streaming fashion, and regularly publishes updated output statistics. The goal is to maintain differential privacy guarantees for the entirety of the outputs, typically either under event-level or user-level privacy. This model was introduced in this paper.
\(\delta\) ("Delta")
The second parameter in approximate DP. It can be interpreted as the maximal probability with which the mechanism has infinite privacy loss. A more complete explanation of its meaning can be found in this blog post.
Differential privacy ("DP")
The term has two meanings.
- A precise mathematical definition that, when enforced, limits the maximum information that an algorithm can leak about any individual data point. This blog post provides and explains this formal notion. Also called pure DP.
- The more general idea, or framework, to mathematically quantify and limit the privacy loss of operations performed on data. This second meaning is more of an umbrella term that refers to the entire field of study, and also covers DP variants.
The blog post series that this glossary belongs to is a good starting point to learn about differential privacy.
Distributed differential privacy
A setting in-between central and local DP, where the aggregator doesn't collect data directly from each user, but only receives the output of some distributed computation over the users' data. Distributed DP protocols typically require much less noise than local DP mechanisms, and avoid the need for a central aggregator that knows everyone's data. This blog post explains this in more detail.
DP-SGD
Short for Differentially Private Stochastic Gradient Descent. This is one of the main mechanisms used to train machine learning models with DP.
DP-SGD works like stochastic gradient descent, except at each iteration, the gradient is clamped, and noise is added to it. This blog post explains the algorithm at a high level; this paper presents a survey of main results, oriented towards practical usage.
\(\varepsilon\) ("Epsilon")
The main parameter in the original definition of differential privacy (pure DP). \(\exp(\varepsilon)\) is a measure of how much probabilistic information an attacker can learn by looking at the output of a DP mechanism. This blog post provides more detail about this.
It is also used as a parameter in multiple variants of differential privacy and usually corresponds to the same idea. There are exceptions, like Rényi DP.
\(\varepsilon\)-differential privacy
See pure differential privacy.
\((\varepsilon,\delta)\)-differential privacy
See approximate differential privacy.
Event-level
A neighboring relation where the databases contain a list of individual events (e.g. someone visiting a webpage, or using a specific app feature), and neighboring databases differ by a single event. Often used in contrast to user-level DP, especially in streaming applications.
Exponential mechanism
A mechanism to select the best choice out of a list of options with scores, in a DP way, when each record in the dataset can have an influence on some or all of the scores — the private selection problem. This is an important building block in a large number of complex DP mechanisms. This blog post is a gentle introduction to this technique.
\(f\)-DP
A variant of differential privacy that expresses the privacy guarantees by bounding the success of an attacker using the formalism of hypothesis testing. It was introduced in this paper.
Fully adaptive composition
A kind of composition where each mechanism takes the output of previously run mechanisms as input, and where the privacy parameters of each mechanism can be influenced by the results of past queries. This blog post explains this in more detail.
Gaussian differential privacy
A variant of differential privacy that enforces that the privacy loss is identical to the one obtained from a one-dimensional Gaussian mechanism. It is particularly well-suited to do privacy accounting for algorithms based on the Gaussian mechanism. This blog post presents the definition and the intuition behind it.
Gaussian mechanism
A function that adds noise sampled from a normal distribution (more often called a "Gaussian distribution" in DP papers) to a numerical value or a vector of numbers. A central building blocks in differential privacy, it can be used to achieve a many DP variants (but not pure DP). This blog post gives a
To add noise to integer-valued data, there is a discrete version of the Gaussian mechanism, with provides the same privacy guarantees.
Global differential privacy
Another, less common name for central differential privacy.
Global sensitivity
The global sensitivity of a function, sometimes simply called "sensitivity", is the maximum possible change in its output when a single record is added to its input. The maximum is taken over all possible inputs. Simple DP mechanisms work by adding noise to the result of a function; the scale of the noise is often multiplied ("calibrated") by the global sensitivity.
Formally, the global sensitivity of a function \(f\) is the maximal distance between \(f\left(D_1\right)\) and \(f\left(D_2\right)\), where \(D_1\) and \(D_2\) are neighboring databases. This distance can be quantified in different ways, see \(L_1\)-sensitivity and \(L_2\)-sensitivity for examples.
Hockey stick divergence
A measure of a distance between two probability distributions that is closely linked with the definition of approximate DP. This blog post illustrates this notion and explains how it can be used in the context of privacy accounting.
Item-level differential privacy
A neighboring relation where each user can contribute multiple items (often data points used to train machine learning models), and neighboring databases differ by a single item, in contrast to user-level DP.
\(L_1\)-sensitivity
The sensitivity, measured using the \(L_1\) distance, more commonly known as the Manhattan distance. Typically denoted by \(\Delta_1\), it is often used to calibrate the scale of Laplace noise to obtain a \(\varepsilon\)-DP mechanism.
\(L_2\)-sensitivity
The sensitivity, measured using the \(L_2\) distance, more commonly known as the Euclidean distance. Typically denoted by \(\Delta_2\), it is often used to calibrate the scale of Gaussian noise to obtain a mechanism satisfying approximate DP and other DP variants. This blog post explains this in more detail.
Laplace mechanism
A function that adds noise sampled from the Laplace distribution to a numeric value, or a vector of numbers. This is one of the most fundamental building blocks of differential privacy. It can be used to achieve pure DP with well-chosen parameters.
The two-sided geometric distribution is sometimes simply called the "discrete Laplace distribution".
Local differential privacy
A setting in which each person in the data randomizes their own data, then passes it to a central entity, who computes an output based on this noisy data. This has a key benefit compared to central DP: the aggregator does not need to be trusted, since they only ever see DP data.
Local DP mechanisms are typically much less accurate than central DP mechanisms for the same task: the noise has to be added to every single data point, not only to the output of statistics. This blog post presents this setting in more detail.
Local sensitivity
The maximum change in a mechanism's output when a single person's data is added to its input, for a fixed input.
Formally, the local sensitivity of a function \(f\) on input \(D\) is the maximal distance between \(f\left(D\right)\) and \(f\left(D'\right)\), where \(D'\) is a neighboring database of \(D\). Some functions, like the median, have a very small local sensitivity for some inputs, and very large for others: calibrating noise to the local sensitivity is not enough to achieve DP. The local sensitivity is used as a building block in complex mechanisms; this page lists a few examples.
Somewhat confusingly, the use of the word "local" here has nothing to do with local DP.
Mechanism
A computer program (or function, if you prefer math terminology) that takes some data as input, and typically provides a privacy guarantee on its output. The word is typically used in two different contexts.
- It can refer to individual building blocks that are used as part of a larger program. Example include the Laplace mechanism or the Gaussian mechanism.
- It can also refer to the larger program itself. In this case, "DP mechanism" is a shorthand for "differentially private mechanism", which means "mechanism that satisfies differential privacy".
Neighboring relation
The neighboring relation defines how the two databases in the definition of differential privacy differ from each other. It determines what exactly is protected by the DP guarantees: this can be the data from a single record, or all records from the same user, or the value of a single attribute, or anything in between. A list of some of the most common options can be found in Section 4 of this paper.
Non-private
Refers to an operation that is performed on the sensitive input data, but does not provide any differential privacy guarantee. Not to be confused with public.
Noise
Randomness injected into a process to make it differentially private. This is often a number sampled from a well-chosen probability distribution (like the Laplace or Gaussian distribution), and added to the result of some operation. But this randomness can also take other forms, like directly sampling from some well-chosen distribution, flipping coins, and so on.
People refer to this process as noise addition, or noise infusion, or sometimes noise injection.
Pan-privacy
A setting within the streaming model where the DP guarantee covers not only the output data, but also the internal state of the algorithm at some intermediary points. This was introduced in this paper.
PATE
A mechanism to train classification models with differential privacy. It works by training multiple models on separate parts of the input data (in a non-private way), having them vote on the classification label for public, unlabeled data points, adding noise to the votes to select the winning label, and train a model on this labeled data. This blog post presents the technique in more detail.
Post-processing
Using the results of a DP mechanism as an input to some other operation (which doesn't otherwise take the sensitive data as input). Differential privacy is preserved by post-processing: the output of that other operation will still be DP, with the same parameters as the original mechanism. This is true for all common DP variants.
Post-processing can be used for many reasons:
- improving the usability of results, for example by removing negative counts or otherwise correcting impossible values;
- improving the accuracy of results, for example by combining multiple DP measurements at different levels of a hierarchy;
- generating synthetic data by post-processing DP statistics computed on the sensitive data;
- using a machine learning model trained with DP (the inference step can be viewed as a post-processing operation);
- and so on.
Privacy accounting
The task of quantifying the privacy guarantee of a given mechanism; typically, finding the smallest \(\varepsilon\) and \(\delta\) values such that the mechanism is \((\varepsilon,\delta)\)-DP.
For complex mechanisms such as DP-SGD, or when releasing a large number of DP statistics, finding the best possible privacy parameters can be very complex. This led to a flourishing literature on different approaches to privacy accounting.
Privacy budget
The maximum privacy loss allocated to a given mechanism; that is, the \(\varepsilon\) value (or \((\varepsilon,\delta)\) values, or more generally the numeric values of the definition's parameters) of this mechanism. Sometimes called the privacy loss budget.
The term "budget" typically implies that the value is fixed in advance. The building blocks of the DP mechanism that take the sensitive data as input "consume" part of this budget.
Privacy filter
An object that keeps track of the privacy loss of different DP mechanisms run by an analyst over time, and prevents the analyst from running further queries if the privacy loss exceeds a certain budget. This contrasts with a privacy odometer, which does not enforce a fixed budget. The two distinct models were introduced in this paper.
Privacy loss
A measure of the privacy leakage of a specific output of a DP mechanism. It answers the question: if the attacker observes this output, how much information can they learn about a single record? In differential privacy, the privacy budget \(\varepsilon\) bounds the privacy leakage regardless of output; by contrast, the privacy loss is the "actual" leakage observed for a specific output. This blog post introduces this notion in more detail.
Since the output of a DP mechanism is randomized, the privacy loss can be interpreted as a random variable. Considering the full distribution of this random variable is a fundamental tool in privacy accounting, called the privacy loss distribution (or PLD). An overview of related definitions and main results can be found in this paper.
Privacy loss distribution ("PLD")
See privacy loss.
Privacy odometer
An object that keeps track of the privacy loss of different DP mechanisms run by an analyst over time, and can return the overall privacy loss consumption over time. This contrasts with a privacy filter, which enforces a maximum budget. The two distinct models were introduced in this paper, which also proves a surprising separation result between the best possible composition theorems in those two cases.
Private
Confusingly, this word can have very distinct meanings, depending on the context.
- "Private data" can refer to the data used as input to a DP mechanism, which needs to be protected (as opposed to public data). Other words used for this include sensitive data, confidential data, or protected data.
- "Private data" can also refer to the output of a DP mechanism, as a shorthand for "differentially private data". Other words used for this include privatized data, noisy data, or… privacy-protected data.
- Finally, "private" can also refer to the DP mechanism itself, again as a shorthand for "differentially private" (as opposed to non-private).
The best option is probably to avoid using this overloaded word altogether.
Private selection
The problem that consists in picking the best choice out of a fixed list of options with scores, where the scores depend on the sensitive input data. The most common DP mechanisms for this task are the exponential mechanism and Report Noisy Max. An introduction to both can be found in these lecture notes
Public data
In the context of differential privacy, this refers to input data that is not protected by any DP guarantee. This does not always correspond to the common usage of the word: for example, inside a company, a list of commercial partners could be used as a side input to a DP mechanism that only protects user data, even if this list is not public information. Also called unprotected data.
Pure differential privacy
The original definition of differential privacy, which gives a single bound on the worst-case privacy loss of a mechanism, typically denoted by \(\varepsilon\). This blog post outlines it in more detail.
Also called \(\varepsilon\)-differential privacy, or simply "differential privacy".
Randomized response
A mechanism used to collect a single binary data point with local DP. It randomizes the input data of each participant, returning the true value with probability \(e^\varepsilon/(1+e^\varepsilon)\), and the other value with probability \(1/(1+e^\varepsilon)\); Extensions exist for data that can take more than two values, and this mechanism is used as a fundamental building block for much more complex algorithms satisfying local DP.
A detailed example can be found in this blog post. Interestingly, randomized response was used to collect sensitive data for social sciences decades before the invention of differential privacy.
Replace-one
A neighboring relation where the two databases differ by changing the value of a single record (but not adding or removing records). Also called replace-one-record or bounded differential privacy.
This neighboring relation can be convenient to use in theory work, but has a somewhat counter-intuitive consequence: the total number of records in the dataset can be published without any noise, regardless of the \(\varepsilon\) value.
Report Noisy Max
A simple DP mechanism for private selection, which consists in adding Laplace noise to each score, and returning the option with the highest noisy score. These lecture notes outline this in more detail.
Rényi differential privacy ("Rényi DP")
A variant of differential privacy that quantifies the average privacy loss of a mechanism. It is almost never used on its own, but is used as a tool for some privacy accounting techniques. An illustrated explanation of this definition can be found in this blog post.
Shuffling
A technique to achieve distributed DP by adding noise to each individual data point (with a local DP mechanism), then randomly reordering the noisy outputs before passing them to a central aggregator. This idea was introduced in this paper; it was later proven than this method has nice privacy amplification properties.
Smooth sensitivity
The smooth sensitivity of a function \(f\) on input database \(D\) is a value obtained from the local sensitivities of this function on \(D\) and on datasets that are "not too far" from \(D\). Adding noise scaled by the smooth sensitivity can provide \((\varepsilon,\delta)\)-DP; this can be convenient for certain functions where the smooth sensitivity is often much smaller than the global sensitivity. However, it is not always easy nor even feasible to compute in a reasonable time. More details can be found in the paper that introduced this tool.
Sensitivity
In the context of differential privacy, this is generally used as a synonym of global sensitivity.
Confusingly, this has absolutely nothing to do with how sensitive the data is (e.g. private health data being more important to protect than public blog posts).
Sparse vector technique ("SVT")
A technique that allows to run arbitrarily many queries on the input data, and learn which queries are the first \(c\) (where \(c\) is fixed) to pass a certain test. It relies on Above Threshold, a simpler version of this mechanism, which simply returns the first query in a list whose result is above a fixed threshold. SVT is an important building block used as part of more complex DP mechanisms. Introductions to this technique can be found on this page or these lecture notes.
Streaming
A setting in which the input data isn't collected all at once, but incrementally over time. This typically come with additional challenges, which can be one or more among the following.
- The server collecting data cannot hold all the input data in memory, and has to use memory-efficient data structures to compute the output.
- The server must update the output statistics regularly, as new data points come in: this is the continual release model.
- An attacker is allowed to have access to some intermediary states of the computation, who must therefore also be covered by the DP guarantee: this is the pan-private model.
Top-\(k\) selection
An extension of the private selection problem where the goal is to return the \(k\) items with the highest scores among a list of options. The most common mechanisms used for this task are the exponential mechanism and Report Noisy Max. A summary of results for this problem can be found in this blog post.
Truncated distributions
A noise distribution that has been modified to never return an output that is too far from the real value. Using such distributions typically comes at some cost in privacy parameters, and in particularly \(\delta\). This blog post outlines possible approaches for this task.
Unbounded differential privacy
See add-or-remove.
User-level differential privacy
A neighboring relation where the databases differ by adding or removing all the records that have been contributed by a single user (typically, of the online service where data is collected). It approximates the common goal of protecting all the data coming from a single individual.
Variant
A variant of differential privacy (or "DP variant") is a definition that reuses the principles of DP, but changes one or more aspects of the original notion. For example, some variants quantify privacy loss in a different way, or use an unusual neighborhing relation. A large number of variants can be found in the scientific literature; this blog post mentions a few of those, this survey paper presents a more comprehensive list.
Zero-concentrated DP ("zCDP")
A variant of differential privacy that enforces many Rényi DP constraints simultaneously. It is particularly convenient for applications such as releasing a large number of statistics with the Gaussian mechanism. An in-depth explanation of this definition can be found in this blog post.
Zero-out
A neighboring relation where the databases differ by replacing one record with a fixed value, typically 0 for numeric data. This is sometimes used instead of the add-or-remove neighboring relation to simplify the analysis.
I am grateful to Alexander Knop, Anatoly Zavyalov, Arun Ganesh, Audra McMillan, Clément Canonne, Debanuj Nayak, Kunal Tawar, Marika Swanberg, Matthew Joseph, Shlomi Hod, Vikrant Singhal, and Xingyu Zhou for their comments and suggestions on early versions of this post.