The search for better empirical privacy metrics
This post is co-authored by Samuel Haney, David Pujol, and myself. Samuel and David did most of the research presented here.
How to determine whether a synthetic data generation method is safe enough to use? This is an important question: despite being advertised as technology that inherently protects privacy, synthetic data generation can often reveal a lot of personal information. There are two major approaches to evaluating this risk.
- Formal privacy approaches, like differential privacy (DP), provide a mathematical guarantee about the maximum level of risk.
- Empirical privacy metrics attempt to estimate the level of risk in the synthetic data, by simulating practical attacks or using heuristics.
Both are sensible approaches, and in principle, they can complement each other well. A wide variety of algorithms already exists to obtain synthetic data with provable privacy guarantees. However, on the empirical side, the situation is more problematic: existing privacy metrics have a long list of serious flaws.
To address this problem, we set out to develop empirical privacy metrics that would avoid existing issues. In this blog post, we give an overview of what we learned so far, and preliminary thoughts on what we believe are the most promising approaches.
What does a good metric look like?
We believe that a good measure of empirical privacy risk should satisfy the following high-level requirements.
- Interpretability. It should correspond to a meaningful notion of privacy risk: the score should provide an understanding of the success of a well-defined attack, with clear adversarial assumptions.
- Robustness. It should be difficult to "game": a good metric should not assume a fixed-strategy adversary, but be generic enough that they can capture a variety of different possible attacks.
- Specificity. It should be able to quantify the privacy risk for various classes of participants, including average case as well as specific sub-populations, or outliers.
- Controllable uncertainty. Any inherent uncertainty it might have should be quantifiable, and it should be possible for users to reduce this uncertainty to acceptable levels.
- Comparability to formal privacy measures. It should be complemented by an analytical upper bound on privacy risk, and it should be possible to convert the metric to have the same unit as this upper bound.
- Performance. It should be able to be computed in a reasonable time using realistic computational resources.
In short, a good metric should simulate a privacy attack with clear goals and assumptions, and measure its success in a rigorous way.
A promising approach: privacy auditing
The above desiderata are aligned with an area of research called privacy auditing, which measures an adversary's ability to perform privacy attacks. A simple way to perform this measurement is as follows.
- Choose two input datasets that differ in a single record.
- Randomly select one of these inputs, and generate synthetic data based on it.
- Model an adversary who has access to the synthetic data and must make a guess on which of the two input datasets was used.
- Repeat steps 1-3 many times to obtain a reliable estimate of the adversary's success rate.
If the mechanism is differentially private, one can show that the success rate cannot be above a certain score (which depends on the \(\varepsilon\) value). So the two approaches complement each other: DP tells us "the risk is at most this", and privacy auditing tells us "the risk is at least this".
Using this framework to create empirical privacy metrics is promising, and satisfies many of our desiderata.
- Interpretability. The metric quantifies the success rate of a well-defined membership inference attack. We can model different attack models by changing which information is available to the adversary (more on this later).
- Robustness. The framework can be used regardless of how the adversary makes their guess. They can use a variety of different attacks to distinguish between the two input datasets, and we can compute the risk score based on which attack performs best.
- Specificity. The adversary is running a membership inference attack on the record that differs between the two input datasets. If this record is sampled from the underlying distribution, the score corresponds to average-case risk. If this record is sampled from a specific sub-population instead, or chosen to be an outlier, we can estimate the risk for these populations.
- Controllable uncertainty. By repeating the experiment multiple times, we can compute precise confidence intervals around the adversary's success rate. Running more experiments decreases the uncertainty.
- Comparability to formal privacy measures. The success rate directly translates into a lower bound on the \(\varepsilon\) value.
The one shortcoming of this approach is performance: to compute a single experiment, we need to run the entire synthetic data generation algorithm! This makes it impractical to generate many experiments, even though this is needed to get accurate risk estimates. Luckily, recent research has developed a clever solution to this problem.
Making privacy auditing more efficient
Privacy Auditing with One (1) Training Run is a paper by Thomas Steinke, Milad Nasr, and Matthew Jagielski, which received an Outstanding Paper award at NeurIPS 2023. It introduces a key insight: rather than running many experiments that each target a single record, we can run one experiment, and target many records simultaneously. This is accomplished using a set of fake records called canaries, and randomly including some of them in the input dataset. The attacker must then determine which canaries were included, using the synthetic data.
The authors show that both experiments are essentially equivalent. This method was developed in the context of membership inference attacks for machine learning models, but the exact same idea can be used for synthetic data. We believe that this is fundamentally the right approach to measuring empirical privacy risk.
Using this framework is not enough to build empirical privacy metrics for synthetic data: we also need to instantiate this framework. More precisely, we need to answer the following questions.
- How are the canaries chosen?
- What knowledge does the adversary have about the underlying distribution?
- Which attack(s) does the adversary use to determine their guess?
The answers turn out to have a major impact on whether empirical metrics can do a good job at detecting privacy violations.
How to choose the canaries?
The choice of canaries determine which records we are targeting for the privacy evaluation. There are three distinct options.
- Measure average-case risk. The simplest option is to pick the canaries from the same distribution as the input data. For example, we can put aside a random subset of the input to use as canaries. In this case, we are estimating the average-case privacy risk across the entire population.
- Measure risk for specific subpopulations. A slightly more complex option is to pick canaries from a subset of the data, for example from demographic minorities. This measures privacy risk for people who are part of this subpopulation. If the subpopulation has unique characteristics or rarer attribute values, this typically leads to higher attack success.
- Measure risk for worst-case targets. A third option is to choose the canaries adaptively, depending on the input dataset and the choice algorithm, to try and maximize the success of the attack. This requires significantly more effort, but gets closer to the high-level privacy goal of protecting even the most vulnerable records in the population.
We found that attacks are much more successful in the third model. Here are two examples.
- If the input dataset is very homogeneous, then outlier data points with many unique attributes are very susceptible to attacks based on e.g. nearest-neighbor distances.
- Some synthetic data generation algorithms determine possible values of the output data using the values that appear in the input dataset, sometimes in a randomized way. Knowing the details of this process allows an attacker to craft well-chosen canaries whose presence will be observable in the output dataset by observing which attribute values occur.
What does the adversary know about the underlying distribution?
The adversary's goal is to distinguish between synthetic data generated using data from an underlying distribution, and synthetic data generated from a specific dataset containing their target. The more information they have about the underlying distribution, the better they can discriminate between both. For example, if the adversary can sample synthetic datasets using the underlying distribution as input, they can use powerful supervised learning techniques as part of their attack. Those typically perform much better than simpler methods.
In practice, priors on the underlying distribution may be obtained using past releases for similar datasets: this often happens when the use case requires generating synthetic data regularly. Adversaries can also get distributional information from the mechanism itself, or even from the output of naive privacy metrics.
Which attack does the adversary use?
Regardless of the choice of canaries and auxiliary knowledge, the adversary has many options to choose from to decide how to compute their guess: distance-based approaches (of which there are many variants), ML classifiers, kernel density estimation, shadow modeling attacks, and so on.
Unsurprisingly, even all things being equal, the choice of the attack method has a major impact on attack success. Through experimentation, we learned a few high-level lessons.
- Shadow modeling attacks perform particularly well. Those require the attacker to be able to generate multiple synthetic datasets from the underlying distribution, and also require more computational power.
- Attacks generally come with parameters; tuning these parameters is often crucial to boost the attack success. Information about the input dataset and the algorithm used can make this tuning step much more effective.
- The success rate can be increased by allowing the adversary to provide a confidence level along with each guess, and only considering high-confidence guesses. This is another way to measure privacy risk for the most vulnerable records.
- Generally, no single attack methodology achieves best performance across all algorithms and input distributions. Even for very simple methods like distance-based classification, some distance functions lead to high attack success against some algorithms and not others, and vice-versa.
Additional considerations
There are a number of other aspects that can be changed when modeling or evaluating privacy attacks on synthetic data generation. Here are a few examples, which we have not tested ourselves.
- Some synthetic data generation products offer the ability to generate new data points conditionally on the value of some attributes. This increases the attack surface in ways that create additional vulnerabilities, and should presumably be taken into account in empirical privacy measures.
- Some attacks from the literature assume that the internals of the synthetic data generator (for example, model weights) are known to the attacker. This opens the door for additional attacks, which we have not tested.
- Throughout this work, we assumed that the adversary cannot modify the input data. This excludes data poisoning attacks, which can be of practical relevance in some scenarios, and could lead to higher attack success.
- We focused on membership inference attacks, but it is straightforward to adapt the framework to run attribute inference attacks instead. We expect that doing so would lead to similar conclusions, though we have not yet experimented with this.
Takeaways
Our research so far has made it clear to us that building good empirical privacy metrics is a very hard task! There are many technical details that make a huge difference in the final score, so a score of “low privacy risk” is hard to interpret — does it suggest that the algorithm is safe, or is the measurement just performing poorly?
Our experiments so far suggest that off-the-shelf metrics do a terrible job at estimating risk. For every single dataset and algorithm combination that we tried, we were able to run successful attacks under our framework, even in the (very frequent) case where existing metrics from open-source packages, when used with default options, would classify the generator as “safe”. This problem is distinct from the numerous conceptual flaws of these metrics: adopting a more principled framework is a good first step, but details matter.
Finally, the fact that attack performance varies so much across synthesis mechanisms has two unfortunate consequences. First, to be sufficiently generic, a good empirical privacy metric needs to run many attacks, not just one. Second, manual auditing is still very likely to outperform even a well-designed metric that runs a whole suite of attacks. This underscores the need to use these metrics responsibly: they provide information about minimum level of risk, and do not on their own provide robust guarantees. To robustly limit privacy risk, formal approaches like differential privacy are still an essential part of the solution.
Addendum: What if you can't inject canaries into the input data?
The one-run privacy auditing procedure we outlined above requires the ability to control the input dataset. This is not always a realistic requirement: in real-world use cases, the synthetic data generation pipeline may already exist, and infrastructure changes may be impractical. In this case, the best we can do is run a retrospective attack, using the synthetic data that has already been generated. Can we still adapt our privacy auditing framework to this setting?
This turns out to be possible, assuming one additional condition: there must exist a holdout dataset, with the same distribution as the input dataset, but which is not used by the synthetic data generation method. The existence of such a dataset is fairly common in practice: most machine learning pipelines randomly split the available input data into a training set and a test set; the test set can be used as holdout data. Then, we use the following insight: rather than injecting canaries in the input data, we obtain the set of “canaries” retrospectively, by randomly sampling from both the training set and the test set.
This model restricts what options are available to measure risk: in particular, it becomes more complicated to measure risk for sub-populations (we can use conditional sampling, but we are limited by what data is available), and we can no longer choose canaries adaptively. In our experience, this leads to fairly severe limitations on what attacks can be modeled, and how generic these privacy metrics can be. The main way we could obtain successful attacks in this model was with attacks exploiting specific behavior of the synthetic data generation method under test.