Ted is writing things

On privacy, research, and privacy research.

Beginnings

Hi there! I'm Damien. I have no idea how people usually start blogs, so bear with me while I figure this out.

I see this place as a way to publish things that are too long for Twitter, too opinionated for Wikipedia, and not pretentious enough for Medium. I'm going to try and keep it to three themes: privacy, research, and privacy research. I'm not (yet) a specialist in any of these. Hopefully, thanks for my jobs and personal interest in those topics, I can add something valuable to what's written online about them.

The following is the vision I have of these three themes. This should give an idea of what I intend to talk about in this blog =)

Privacy

It's difficult to define what privacy encompasses. It's easier to realize when you don't have enough privacy — through bad surprises, uneasy feelings of creepiness, or real risks to your safety.

When a parent or a partner installs stealthy software on your phone to spy on your texts and calls, that's an invasion of your privacy. When a company sells your name, address and purchase history to some sketchy third-party that sends you targeted ads, the uneasy feeling you get comes from a lack of privacy. Full-body scanners in certain airports are an attack to one's bodily privacy. Data leaks are a risk to users' privacy.

Privacy issues usually come from a lack of transparency, of control, or both. In an ideal world, everybody would know exactly who has access to which data about them and why. Personal data collection would not happen without informed consent, and people would have a right to access, modify and delete data that other people or organizations hold about them.

The fuzziness, and the complexity of the issues in this space, are part of what I find interesting about them. I have done many privacy reviews for Google products, and there is always something interesting and new with each of them. Would users expect this behavior? Is this deletion action clear enough? Could someone re-identify this aggregated data?

Like security, privacy is of particular importance for marginalized communities. Having your phone number leaked online is much more problematic if you're a high-profile political activist, or a closeted LGBTQ+ blogger. Harassment of folks that belong to minorities is a major problem, and badly-designed sharing interfaces or insufficient anti-abuse tools can lead to dramatic consequences. Designing tools that deal with potentially sensitive data, and failing to consider these specific risks, is highly irresponsible. And you can easily guess what I think of compliance-based privacy programs…

I also try to avoid absolutist viewpoints. They are hardly ever constructive, and they are often dangerous. I know people who refuse to use Signal because it's not available without Google Play Services, while continuing to communicate via cleartext SMS messages. For most practical problems, there is no perfect solution. Focusing on defending against a hypothetical all-powerful targeted attacker is usually pointless. Instead, I try to focus on realistic threat models, usable tools, and risk mitigation.

Research

I started a part-time PhD after two years of software engineering at Google.

To solve an engineering problem, the path is quite straightforward. Grasp the scope of the problem, design a solution, validate the design with coworkers and stakeholders, write code, verify that the solution is "good enough", then productionize it. Once the problem disappeared, there's no time to think about it more: there are other problems to solve, other fires to put out.

The whole process is fun and rewarding, but I'm frustrated by the ending. What if we could design a simpler or more efficient solution? Prove that it works in a wider range of situations? Share the idea behind it with more people, and see whether they get inspired and solve other problems? Doing all of this is not immediately rewarding, but I think it can have a deeper, and longer-lasting impact, than core engineering work.

I optimistically think that academia is the place to do that. Compare the solution to what's out there already, make more experiments, write proofs, figure out what additional impact it could have. Share the results with as many people as possible. It might not be worth the time, but I think it's worthwhile to give it a try. There are certainly interesting things to learn along the way.

The one thing that I'm afraid of is spending time solving the wrong problems. Finding a "good problem" is not easy: a good problem must be difficult enough to not have been already solved, but simple enough to have a chance at tackling it. Identifying practical problems and their precise constraints is also hard, when the main source of inspiration is other academics' work.

I'm frustrated about the lack of incentives to do research work as a software engineer, but the incentives of academia are even more broken. Publication metrics are a bad way to estimate one's impact, especially in the short term. The peer review process is terribly implemented in practice. The whole system makes it painfully slow to gather feedback, and the little feedback you get is imprecise. The idea of having my work praised only to realize much later that it didn't make a difference in practice… It's even scarier to me than the idea of not finding joy and impact in my research, and deciding to quit.

But I'm not exactly pessimistic :D I feel lucky and enthusiastic about this part-time project. Continuing to do engineering work for Google gives me an endless input of complicated real-world problems to tackle, many of which seem to be good candidates for research projects. I am surrounded by impressively smart and passionate coworkers on both sides, whose feedback is invaluable. And I don't feel extremely attached to the idea of having an academic career or even getting the title at the end of my PhD, so I don't really feel the pressure to publish everything and anything just to increment some counters.

All in all, this sounds like a fun and challenging adventure. I'm excited to see what I'll learn along the way!

Privacy research

My research, like my engineering job at Google, will focus on privacy. This is a field whose boundaries are not very well-defined, and that has very distinct sub-fields. Some researchers focus on user research to understand the perceptions of real people with regard to their personal data (there are a bunch of them at Google). Very little math is involved. Some are designing algorithms that have provable privacy-related properties, like private set intersection or differentially private surveys. Lots of math there! ^^ Some study the problem of anonymizing (or de-identifying) a dataset, so it can be used by more people or shared with third parties. Some focus on onion routing, on online tracking, on cryptocurrency, on privacy policies, on genetic privacy, on social networks, and the list is far from exhaustive. So… what am I doing exactly?

My PhD project is about making it easier for data owners to understand and protect the personal information contained in their databases. I see this goal as having two main subcomponents.

  1. Risk analysis. There are lots of organizations, companies or governments which sit on large databases with personal information, and it's difficult for them to realize how sensitive it is. Leaking your users' country of origin is intuitively less of a problem than leaking their e-mail addresses, which in turn is not as big a deal as leaking their credit card information. Sadly, doing this type of inventory and risk analysis is currently pretty difficult: it requires time, investment, and specific expertise. It shouldn't have to be this way, so I'm working towards building tools that make this easier.

  2. Anonymization. Once you realized how sensitive your data is, you hopefully will want to take steps to protect it. There are many ways to lower the risk of bad people having access to your database: encryption, access controls, or many other security techniques. Another option is to modify the database, in a way that makes sure that somebody with access to it can't deduce creepy things about the individuals whose data is in the database. I'm working towards making this process easier and more understandable for data owners.

I could (and hopefully, I will!) talk at length about these two things. They have already been studied by many people over the past ~15 years (especially anonymization), but I think that there is a lot of room for more vulgarization on the topic, and significant improvements to do on the research side. On the anonymization topic in particular, I feel like it is urgent to work towards bridging the gap between research advances and concrete use cases.

Maybe I'll realize along the way that I'm looking at the wrong problems, or that it proves more difficult than I thought to improve the state of the art. But as I've been told, that's part of what makes it challenging and fun ^^

All opinions here are my own, not my employer's.   |   Feedback on these posts is very welcome! Please reach out via e-mail (se.niatnofsed@neimad) or Twitter (@TedOnPrivacy) for comments and suggestions.   |   Interested in deploying formal anonymization methods? My colleagues and I at Tumult Labs can help. Contact me at oi.tlmt@neimad, and let's chat!