Research highlight: Cliopatra: Extracting Private Information from LLM Insights
When Anthropic came up with a new "privacy-preserving analysis system" to gain insights into AI use, and didn't use any provably robust notion to back up their privacy claims, I was mildly surprised. Surely they have both the money and the scientific maturity level to do better?
But Clio, the system in question, sounded relatively reasonable, with multiple layers of risk mitigation built-in. Maybe adding differential privacy would have been overkill. I also didn't want to publicly criticize their approach in the absence of demonstrated real-world risk. So I didn't comment on their approach.
You can probably guess where this is going.
Fast forward to last week, and a new paper: Cliopatra: Extracting Private Information from LLM Insights, by Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, and Peter Kairouz. The authors show that with carefully designed attacks on Clio, they can bypass all the ad hoc mitigations, and successfully extract users' medical histories1, in a way that provides 100% attacker certainty for some records.
This is a new and clever take on an old attack. We've known for decades that k-anonymity is vulnerable to active attacks. Here, this is combined with prompt injection to encourage the LLM "summarizer" to actually include information from unique records. Perhaps more surprisingly, the authors find that some defensive layers are simply ineffective: the "LLM auditors" systematically report low privacy risk, and entirely fail to detect the attacks.
This work is the newest example of something we see over and over: it's very easy to convince oneself that some ad hoc privacy protections are more effective than they actually are. In the original Anthropic paper, the authors only estimate that Clio may be vulnerable to "PII persistence" and "group privacy violations", missing the potential for data poisoning attacks entirely.
The attack paper ends on another interesting takeaway. The authors run their attacks on Urania, an alternative system built by Google to address the same use cases, but that uses differential privacy to protect the user conversations. Similarly to what we observe in other contexts, this translates to much higher level of empirical risk mitigation — even with large privacy budgets like \(\varepsilon=25\) (apparently necessary to get a reasonable level of utility).
This story underscores a major difference in how we treat poorly-justified claims about privacy compared to security claims. If someone comes up with a new cryptographic protocol and doesn't provide a robust security analysis with well-defined attacker assumptions, we simply don't take them seriously. We should hold "privacy-preserving" systems to the same standard, and demand carefully delineated attacker models and a comprehensive privacy analysis. Differential privacy may not always be necessary, but people should at least explain why they think provably robust approaches are not needed2.
Maybe some day, we'll get there?
Thinking of deploying a privacy-preserving analytics system to the real world? Let's chat! My independent consultancy, Hiding Nemo, specializes in helping organizations unlock data value, with a principled approach to risk analysis and mitigation.
I'm grateful to Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, and Peter Kairouz for their helpful feedback on earlier versions of this post.
-
In lab conditions — no production system nor real user data was targeted in the experiments. ↩
-
Note that DP may not even actually be needed for Clio in practice! They're not sharing the data publicly, probably (?) have reasonable information security & data governance practices, and perform a sampling step which likely provides some risk mitigation. But the privacy analysis doesn't rely on any of that. It presents the design of Clio as inherently privacy-preserving, suggesting that it would still be safe when deployed in a more permissive context. ↩
