Don't use diffprivlib
If you're looking for a Python library to perform differential privacy computations, diffprivlib seems to be an attractive choice. You'll find it prominently featured in Google search results. It's maintained by IBM, and extensively cited in the scientific literature. Its README states that you can use it to "build your own differential privacy applications"1, and it's regularly updated. Last but not least, it's very easy to pick up: its API mimics well-known tools like NumPy or scikit-learn, making it look simple and familiar to data scientists.
Unfortunately, diffprivlib is flawed in a number of important ways. I think most people should avoid using it. This blog post lists a few reasons why.
Unclear and inconsistent privacy notions
Differential privacy comes in different flavors. A common distinction for the "standard" definition is: are you trying to protect the addition or removal of a single record? Or are you trying to protect a single record changing its value? This distinction is subtle, but critical: a mechanism that satisfies one notion does not necessarily satisfy the other. So if you want to correctly document your DP strategy, or run experiments that compare multiple mechanisms for the same task, you really want to get it right.
Diffprivlib does not document the privacy notion they use. Worse, the source
code suggests that in fact, it uses different notions in different parts of
the API. The implementation of mean
divides the noisy sum by the real count,
which only works in the change-one-record model. But the implementation of
histogram
adds geometric noise of scale \(1/\varepsilon\), which only makes
sense in the add-or-remove-one-record model.
Some API methods blurry the picture even further. For example, histogram
accepts a weight
argument, which makes some records "count" more than others,
but has no impact on the sensitivity behind the scenes. This is a bad footgun:
if one of the weights is larger than 1, you're simply not getting the advertised
privacy guarantee.
Floating-point vulnerabilities
One of the most well-known potential vulnerabilities with DP software is the use of floating-point math in noise addition. If you do this naively, the guarantees vanish: an attacker can confirm a hypothesis about a specific person with 100% certainty. This class of problems is worse than other kinds of vulnerabilities, like timing attacks: it can happen even if the attacker cannot influence the data, and if the person writing the code is trusted.
This has been known for a decade. It's table stakes for any DP software to fix this issue2. Diffprivlib authors proposed a solution to this problem in a 2021 paper. Crucially, they only showed that their fix mitigated one specific attack, not that the result actually satisfied DP.
Three months later, I came up with a simple variant of the attack that completely broke their fix. I let them know about the problem, and about existing approaches that solved this in a more principled way. My colleagues and I later came up with a different solution to this problem, with a readily-available, open-source Python implementation.
That was more than 3 years ago. The vulnerable code in diffprivlib is still there. Most of the noise addition primitives in diffprivlib are broken.
Mishandled edge cases
Real-world data isn't as perfect as our math formulas would like them to be. It has null values. Floating-point values can be NaN or infinite. Such "erroneous" values can encode sensitive data in practice: a null value in a "salary" column can mean "unemployed", a NaN in a test result column can mean "has not been tested yet", and so on. It's important to deal with them in a way that doesn't break DP guarantees.
Diffprivlib largely ignores this problem. Inject a single NaN value to e.g. the
median
operation and you get NaN as the result. This is deterministic and
silent: you just get a result that leaks information about a single data point.
Too bad if you're building a data product that runs every day: everything might
work fine at first, then your data changes ever-so-slightly, and all of a sudden
you're publishing data that leaks information about individuals, without
noticing.
There are many other subtleties that you have to be mindful of when building DP software. Diffprivlib doesn't seem to have a principled approach to deal with any of them, leading to tons of vulnerabilities of this kind. Empty data4, overflows3, floating-point precision issues5, and so on.
Misleading documentation
Diffprivlib's safety issues are not documented anywhere. Take the floating-point vulnerabilities, for example: the documentation even suggests that the noise primitives "prevent against reconstruction attacks". It links to the paper, which hasn't been retracted, or appended with a comment like "this method is actually not safe, please don't use it".
This has negative consequences not just for diffprivlib users and the people in their data, but for the larger ecosystem. A completely different tool, built recently, uses the same broken method to implement their noise addition primitives6. This probably would not have happened if diffprivlib had adequate disclaimers.
Conclusion
There's more to say about diffprivlib7, and I only looked at the simpler parts of the API, not the more complex ML stuff. But that's enough for me: I'm confident recommending people avoid using it. This recommendation is not only for real-world use cases that would put people at risk: this list of issues can also lead to incorrectly interpreting experiments, which can be a major problem for research use cases as well.
I think there's a path for things to get better. It starts with clearly discouraging the use of the library for real-world use cases, and documenting vulnerabilities and caveats. Then, the authors could e.g. unify and clearly document the privacy notion used by different methods, and make sure that precision issues can't lead to misleading experimental results. At that point, it might become a useful tool for experimentation and research.
I sent a draft of this blog post to diffprivlib maintainers a month before publication. This was their response.
Diffprivlib is a valuable resource for scientists and engineers as an introduction to differential privacy and was never envisaged to be deployed in production use cases as-is. We have amended the Readme to make this more explicit and avoid any confusion. Our research on related topics continues as time allows, a recent example of which is our paper published at ACM CCS 2024. We anticipate this work will be integrated within diffprivlib in due course.
And now for some additional notes and disclaimers.
Open-source
The reason why I can write a blog post like this is because Diffprivlib is open source. This is why I, or anyone else, can play with it, break it, understand its flaws. Many other tools claim to achieve differential privacy, but do not publish their source code nor design decisions, so these claims are completely unverifiable. This is strictly worse. Differential privacy is like cryptography: you should assume that anyone who tells you "my system satisfies DP but I'm keeping the details secret" is a clown at best and dishonest at worst.
Scope and related work
This article focuses on flaws present in a specific tool. It does not attempt to explain what it takes to build a robust, production-ready DP engine. If that's what you're looking for, you might enjoy this blog post. It also doesn't go into detail about what makes a DP vulnerability more or less severe. There's a lot to say about this — that blog post idea is on my backlog, let me know if you're particularly interested in reading about it.
Conflict of interest
My job involves maintaining a different open-source DP library. You may consider me as biased when discussing other tools. This article only represents my personal opinions and has not been vetted by my employer.
Acknowledgments
I am thankful to Alexander Knop, Jonathan Ullman, and Juba Ziani, and Naoise Holohan for their helpful comments on previous versions of this post.
-
This has since been updated, see below. ↩
-
For example, this is explicitly called out in the NIST guidelines on differential privacy. ↩
-
median([0]+[1]*2982, bounds=(0,1))
consistently returns a value.median([0]+[1]*2983, bounds=(0,1))
consistently crashes. ↩ -
mean([1], bounds=(0,1))
consistently returns a value.mean([], bounds=(0,1))
consistently crashes. ↩ -
I have not re-implemented the attack described in this paper, but the exponential mechanism is implemented exactly as described in the paper, without any consideration for precision issues. ↩
-
They are now saying they will fix it, yay. ↩
-
See for example the findings of usabilities studies like this one or this one, which confirm that these aren't theoretical concerns: when people try using diffprivlib, they don't get the privacy guarantees they expect. ↩