# Privacy Preserving Machine Learning

IEEE FOCS 2022 Workshop
November 1, 2022

## Scope

The number of aspects of our everyday life that are affected by data-driven services has increased dramatically in the last decade. Healthcare providers, insurance companies, online retailers, smartphone manufacturers and streaming providers all have access to vast amounts of data about their current and potential customers. The capacity to leverage this data through machine learning technologies has the potential to increase the productivity of these and other economic sectors. On the other hand, the widespread adoption of these technologies raises ethical, technical and regulatory issues about privacy, accountability, and interpretability.

The increasingly broadening integration of machine learning and privacy preserving techniques in the context of many practical applications, has brought to light limitations in the current understanding of the interplay between these areas. It has also underscored the need for expanding the theoretical foundations of this cross-disciplinary field in order to uncover inherent limitations of what could be achievable in the terms of trade-offs between utility, privacy, fairness, efficiency, as well as open new avenues for feasibility results that go beyond the state of art. Our workshop aims to enable different subsets of the CS theory community working in areas such as cryptography, differential privacy, learning, algorithms, and other to come together and jointly develop a holistic treatment of the problem of privacy-aware data analysis.

## Travel Grants

We are able to provide a limited number of grants to cover the workshop registration fees of PPML attendees who have not received other support from FOCS this year. To apply, please send an email to ppml2022@googlegroups.com by October 14 with the subject “PPML22 Travel Grant Application”. Please include a half-page statement of purpose. The notifications will be sent by October 17. Applications after the deadline could be considered on a rolling basis depending on the availability of funding.

## Schedule

The workshop is split into two sessions, one in the morning and one in the afternoon. At the end of each session, we will have a slot for discussing recent open research questions. The talks will also be live-streamed via Zoom (Meeting ID: 96413713962, Passcode: 466402). All times are in MDT (GMT-6).

 Morning Session 11:00–11:10 Welcome & Introduction 11:10–11:40 Kunal Talwar — Shifted Divergences for Privacy and Mixing This talk will discuss a line of work on Shifted Divergences, a notion of distance between distributions over real vectors that is regularized by optimal transport. I will show two applications of this concept, one to differential privacy, and one to rapid mixing. The first result shows a tight privacy analysis of the noisy SGD algorithm and its variants for smooth convex loss function functions. We show that the privacy cost of the algorithm, that grows initially with the number of iterations, plateaus after a certain number of steps. The second result studies the convergence of the Langevin Monte Carlo algorithm, a classical algorithm for sampling from a log concave distribution. We give a tight bound on the mixing time of the (discretized) Langevin algorithm to its stationary distribution, in various measures of distance. No differential privacy background will be assumed. Based on joint works with Jason Altschuler, Vitaly Feldman, Ilya Mironov and Abhradeep Thakurta. 11:40–12:10 Clément Canonne — Differentially Private Statistical Inference: A Few Ideas Go a Long Way I will discuss some old, recent, and ongoing work on distributed learning and testing of discrete data under various privacy models, with a focus on distributed settings where the (sensitive) dataset is spread across many users. I will focus on a handful of simple, yet quite powerful ideas which, as the title of the talk suggests, go a long way in obtaining private and sample-optimal testing algorithms in a variety of settings. Based on joint works with Jayadev Acharya, Abigail Gentle, Cody Freitag, Hongyi Lyu, Yucheng Sun, Ziteng Sun, and Himanshu Tyagi; as well as many other works I did not participate in but am very happy to survey and use. 12:10–12:30 Open Questions Afternoon Session 16:00–16:10 Welcome Back 16:10–16:40 Sofya Raskhodnikova — The Price of Differential Privacy under Continual Observation We will discuss the accuracy of differentially private mechanisms in the continual release model. A continual release mechanism receives a sensitive dataset as a stream of $T$ inputs and produces, after receiving each input, an output that is accurate for all the inputs received so far. In contrast, a batch algorithm receives the data as one batch and produces a single output. We provide the first strong lower bounds on the error of continual release mechanisms. In particular, for two fundamental problems that are widely studied and used in the batch model, we prove that the worst case error of every continual release algorithm is $\tilde \Omega(T^{1/3})$ times larger than that of the best batch algorithm. Previous work shows only a polylogarithimic (in $T$) gap between the worst case error achievable in these two models; further, for many problems, including the summation of binary attributes, the polylogarithmic gap is tight (Dwork et al., 2010; Chan et al., 2010). Our results show that problems closely related to summation---specifically, those that require selecting the largest of a set of sums---are fundamentally harder in the continual release model than in the batch model. Our lower bounds assume only that privacy holds for streams fixed in advance (the "nonadaptive" setting). We also formulate a model that allows for adaptively selected inputs and prove the privacy of several algorithms in that model. In particular we show that our lower bounds are matched by the error of simple algorithms whose privacy holds even for adaptively selected streams. The adaptive model captures dependencies that arise in many applications of continual release. Based on joint work with Palak Jain, Satchit Sivakumar, and Adam Smith. 16:40–17:10 Yuval Ishai — New Techniques for Efficient Secure Computation Secure computation is a powerful cryptographic primitive, enabling two or more parties to perform joint computations on their secret inputs without revealing the inputs to each other. An appealing application domain is privacy-preserving machine learning, where secure computation is helpful for training and classification tasks involving sensitive data. Secure computation is known to be generally possible, under standard cryptographic assumptions, since the 1980s. Unfortunately, security does not come for free. In this talk I will survey some recent techniques for minimizing the overhead of secure computation, along with remaining challenges and open problems. 17:10–17:30 Open Questions

## Organization

### Workshop organizers

• Borja Balle (Deepmind)