Privacy Preserving Machine Learning - PriML and PPML Joint Edition

NeurIPS 2020 Workshop
Virtual workshop, December 11, 2020

Scope

This one day workshop focuses on privacy preserving techniques for machine learning and disclosure in large scale data analysis, both in the distributed and centralized settings, and on scenarios that highlight the importance and need for these techniques (e.g., via privacy attacks). There is growing interest from the Machine Learning (ML) community in leveraging cryptographic techniques such as Multi-Party Computation (MPC) and Homomorphic Encryption (HE) for privacy preserving training and inference, as well as Differential Privacy (DP) for disclosure. Simultaneously, the systems security and cryptography community has proposed various secure frameworks for ML. We encourage both theory and application-oriented submissions exploring a range of approaches listed below. Additionally, given the tension between the adoption of machine learning technologies and ethical, technical and regulatory issues about privacy, as highlighted during the Covid-19 pandemic, we invite submissions for the special track on this topic.

  • Special track: privacy of ML and data analytics in pandemic (e.g., secure contact tracing)
  • Differential privacy and other statistical notions of privacy: theory, applications, and implementations
  • Secure multi-party computation techniques for ML
  • Learning on encrypted data
  • Hardware-based approaches to privacy-preserving ML
  • Trade-offs between privacy and utility
  • Privacy attacks
  • Federated and decentralized privacy-preserving algorithms
  • Programming languages for privacy-preserving data analysis
  • Empirical and theoretical comparisons between different notions of privacy
  • Policy-making aspects of data privacy
  • Privacy in autonomous systems
  • Online social networks privacy
  • Interplay between privacy and adversarial robustness in machine learning
  • Relations between privacy, fairness and transparency
  • Applications of privacy-preserving ML

Call For Papers & Important Dates

Download Full CFP


Submission deadline: Oct 02, 2020, 23:59 (Anywhere on Earth)
Notification of acceptance: Oct 23, 2020

Submission Instructions

Submissions in the form of extended abstracts must be at most 4 pages long (not including references; additional supplementary material may be submitted but may be ignored by reviewers), non-anonymized and adhere to the NeurIPS format. We encourage submission of work that is new to the privacy-preserving machine learning community. Submissions solely based on work that has been previously published in conferences on machine learning and related fields are not suitable for the workshop. On the other hand, we allow submission of works currently under submission and relevant works recently published in privacy and security venues. The workshop will not have formal proceedings, but authors of accepted abstracts can choose to have a link to arxiv or a pdf added on the workshop webpage.

Invited Speakers

View the recordings on SlidesLive!

Schedule

Time Zone Accommodation

The workshop will be hosted in two blocks: BLOCK I accommodates Asia and Europe (morning) time zones, BLOCK II accommodates U.S. and Europe (evening) time zones. Unless otherwise noted, all listed times are CET (UTC+1).

Each block will contain three main components: an hour to view two recorded talks by invited speakers followed by a 30min live joint Q&A with both, a poster session/social via Gather.Town (little tutorial for our venue) , and various 'contributed talks' highlighting submissions to this workshop with corresponding live Q&A sessions. Due to time zone constraints, most contributed talks will be in BLOCK II, but all talks will be recorded on SlidesLive for viewing afterwards.

To join the workshop you will need a NeurIPS 2020 workshop registration ticket (see neurips.cc for more information). Instructions on how to join the workshop will be provided by NeurIPS.

BLOCK I, Asia/Europe: (17:20-21:00 Beijing) (14:50-18:30 Delhi) (10:20-14:00 Paris)
10:20-10:30 Welcome & Introduction
10:30-11:00 Invited talk (1): Reza Shokri — Data privacy at the intersection of trustworthy machine learning
Machine learning models leak significant amount of information about their training data, through their predictions and parameters. In this talk, we discuss the impact of trustworthy machine learning, notably interpretability and fairness, on data privacy. We present the privacy risks of model explanations, and the effects of differential privacy on interpretability. We will also discuss the trade-off between privacy and (group) fairness, and how training fair models can make underrepresented groups more vulnerable to inference attacks.
11:00-11:30 Invited talk (2): Katrina Ligett — The Elephant in the Room: The Problems that Privacy-Preserving ML Can’t Solve
In this talk, I attempt to lay out the problems of the data ecosystem, and to explore which of them can potentially be addressed by the toolkit of privacy-preserving machine learning. What we see is that while privacy-preserving machine learning has made amazing advances over the past decade and a half, there are enormous and troubling problems with the data ecosystem that seem to require an entirely different set of solutions.
11:30 - 12:00 Invited Talk Q&A with Reza and Katrina
12:00-12:10 Break 10min
12:10-12:30 POSEIDON: Privacy-Preserving Federated Neural Network Learning (contributed talk: 15min presentation + 5min Q&A)
Sinem Sav, Apostolos Pyrgelis, Juan Ramón Troncoso-Pastoriza, David Froelicher, Jean-Philippe Bossuat, João Sá Sousa and Jean-Pierre Hubaux
We address the problem of privacy-preserving training and evaluation of neural networks in an N-party, federated learning setting. We propose a novel system, POSEIDON, that employs multiparty lattice-based cryptography and preserves the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to N−1 parties. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours.
12:30-14:00 Gather.Town Poster Session and Social (log in with Neurips registration credentials)
BLOCK II, U.S./Europe: (8:30-13:25 LA) (11:30-16:25 NYC) (17:30-22:25 Paris)
17:30-17:40 Welcome & Introduction
17:40-18:05 Invited talk (1): Carmela Troncoso — Is Synthetic Data Private?
Synthetic datasets produced by generative models have been advertised as a silver-bullet solution to privacy-preserving data publishing. In this talk, we show that such claims are unfounded. We show how synthetic data does not stop linkability or attribute inference attacks; and that differentially-private training does not increase the privacy gain of these datasets. We also show that some target records receive substantially less protection than others and that the more complex the generative model, the more difficult it is to predict which targets will remain vulnerable to inference attacks. We finally challenge the claim that synthetic data is an appropriate solution to the problem of privacy-preserving microdata publishing.
18:05-18:30 Invited talk (2): Dan Boneh — Proofs on secret shared data: an overview
Many consumer devices these days are Internet enabled, and locally record information about how the device is used by its owner. Manufacturers have a strong interest in mining this data in order to improve their products, but concerns over data privacy often prevents this from taking place. A recent collection of techniques enables companies to process this distributed data without ever seeing the data in the clear. One obstacle is that a malfunctioning device might send invalid data and throw off the analysis. No one will ever know because no one can see the data in the clear. To prevent this, there is a need for ultra light-weight zero-knowledge techniques to prove properties about the hidden collected data. This talk will survey some recent progress in this area.
18:30 - 19:00 Invited Talk Q&A with Carmela and Dan
19:00-19:10 Break 10min
19:10-20:10 Gather.Town Poster Session and Social (log in with Neurips registration credentials)
20:10-20:20 Break 10min

20:20-20:35 On the (Im)Possibility of Private Machine Learning through Instance Encoding (contributed talk: 15min presentation)
Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurtak, Florian Tramèr
A learning algorithm is private if the produced model does not reveal (too much) about its training set. In this work, we study whether a non-private learning algorithm can be made private by relying on an instance encoding mechanism that modifies the inputs before being fed to the normal learner. We formalize the notion of instance encoding and its privacy by providing two attack models. We first prove impossibility results for achieving the first (stronger) model. We further demonstrate practical attacks in the second (weaker) attack model on recent proposals that aim to use instance encoding for privacy.
20:35-20:50 Poirot: Private Contact Summary Aggregation (contributed talk: 15min presentation)
Chenghong Wang, David Pujol, Yaping Zhang, Johes Bater, Matthew Lentz, Ashwin Machanavajjhala, Kartik Nayak, Lavanya Vasudevan and Jun Yang
Physical distancing between individuals is key to preventing the spread of a diseasesuch as COVID-19. On the one hand, having access to information about physical interactions is critical for decision makers; on the other, this information is sensitive and can be used to track individuals. In this work, we design Poirot, a system to collect aggregate statistics about physical interactions in a privacy-preserving manner. We show a preliminary evaluation of our system that demonstrates the scalability of our approach even while maintaining strong privacy guarantees.
20:50-21:05 Greenwoods: A Practical Random Forest Framework for Privacy Preserving Training and Prediction (contributed talk: 15min presentation)
Harsh Chaudhari and Peter Rindal
In this work we propose two prediction protocols for a random forest model. The first takes a traditional approach and requires the trees in the forest to be complete in order to hide sensitive information. Our second protocol takes a novel approach which allows the servers to obliviously evaluate only the “active path” of the trees. This approach can easily support trees with large depth while revealing no sensitive information to the servers. We then present a distributed framework for privacy preserving training which circumvents the expensive procedure of privately training the random forest on a combine dataset and propose an alternate efficient collaborative approach with the help of users participating in the training phase.
21:05-21:20 Joint Q&A with the three speakers above
21:20-21:25 Break 5min
21:25-21:40 Shuffled Model of Federated Learning: Privacy, Accuracy, and Communication Trade-offs (contributed talk: 15min presentation)
Antonious Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz and Ananda Theertha Suresh
We study empirical risk minimization (ERM) optimization with communication efficiency and privacy under the shuffled model. We use our communication- efficient schemes for private mean estimation in the optimization solution of the ERM. By combining this with privacy amplification by client sampling and data sampling at each client as well as the shuffled privacy model, we demonstrate that one can get the same privacy, optimization-performance operating point as recent methods using full-precision communication, but lower communication cost, i.e., effectively getting communication efficiency for “free”.
21:40-21:55 Sample-efficient proper PAC learning with approximate differential privacy (contributed talk: 15min presentation)
Badih Ghazi, Noah Golowich, Ravi Kumar and Pasin Manurangsi
In this paper we prove that the sample complexity of properly learning a class of Littlestone dimension d with approximate differential privacy is at most Õ(d^6) (ignoring privacy and accuracy parameters). This result answers a question of Bun et al. (FOCS 2020) by improving upon their upper bound of 2^O(d) on the sample complexity. Prior to our work, finiteness of the sample complexity for privately learning a class of finite Littlestone dimension was only known for improper private learners, and the fact that our learner is proper answers another question of Bun et al. which was also asked by Bousquet et al. (2019). Using machinery developed by Bousquet et al., we show that the sample complexity of sanitizing a binary hypothesis class is at most polynomial in its Littlestone dimension and dual Littlestone dimension. This implies that a class is sanitizable if and only if it has finite Littlestone dimension. An important ingredient of our proofs is a new property of binary hypothesis classes that we call irreducibility, which may be of independent interest.
21:55-22:10 Training Production Language Models without Memorizing User Data (contributed talk: 15min presentation)
Swaroop Ramaswamy, Om Dipakbhai Thakkar, Rajiv Mathews, Galen Andrew, Brendan McMahan and Françoise Beaufays
This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL) while leveraging the Differentially Private Federated Averaging (DP-FedAvg) technique. There has been prior work on building practical FL infrastructure, including work demonstrating the feasibility of training language models on mobile devices using such infrastructure. It has also been shown (in simulations on a public corpus) that it is possible to train NWP models with user-level differential privacy (DP) using DP-FedAvg. Nevertheless, training production-quality NWP models with DP-FedAvg in a real-world production environment on a heterogeneous fleet of mobile phones requires addressing numerous challenges. For instance, the coordinating central server has to keep track of the devices available at the start of each round and sample devices uniformly at random from them, while ensuring \emph{secrecy of the sample}, etc. Unlike all prior privacy-focused FL work of which we are aware, for the first time we demonstrate the deployment of a DP mechanism for the training of a production neural network in FL, as well as the instrumentation of the production training infrastructure to perform an end-to-end empirical measurement of unintended memorization.
22:10-22:25 Joint Q&A with the three speakers above

Accepted Papers

Nishat Koti, Mahak Pancholi, Arpita Patra, Ajith Suresh
(B3) SWIFT: Super-fast and Robust Privacy-Preserving Machine Learning    [link]
Xianrui Meng, Joan Feigenbaum
(B4) Privacy-preserving XGBoost Inference    [arxiv]
Georgios Damaskinos, Celestine Mendler-Dünner, Rachid Guerraoui, Nikolaos Papandreou, Thomas Parnell
(B6) Differentially Private Stochastic Coordinate Descent    [arxiv]
Carsten Baum, Shahar Segal, Yossi Adi, Benny Pinkas, Joseph Keshet, Chaya Ganesh
(B8) Fairness in the Eyes of the Data: Certifying Machine-Learning Models    [arxiv]
Mimee Xu, Awni Hannun, Laurens van der Maaten
(B9) Data Appraisal Without Data Sharing    [PDF]
Swaroop Ramaswamy , Om Dipakbhai Thakkar, Rajiv Mathews, Galen Andrew, Brendan McMahan, Françoise Beaufays
Training Production Language Models without Memorizing User Data (contributed talk)    [arxiv]
Om Dipakbhai Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Françoise Beaufays
(B11) Understanding Unintended Memorization in Federated Learning    [arxiv]
Sinem Sav, Apostolos Pyrgelis, Juan Ramón Troncoso-Pastoriza , David Froelicher, Jean-Philippe Bossuat, João Sá Sousa, Jean-Pierre Hubaux
POSEIDON: Privacy-Preserving Federated Neural Network Learning (contributed talk)    [arxiv]
Javier Alvarez-Valle, Pratik Bhatu, Nishanth Chandran, Divya Gupta, Aditya Nori, Aseem Rastogi, Mayank Rathee, Rahul Sharma, Shubham Ugare
(B13) Secure Medical Image Analysis with CrypTFlow    [PDF]
Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta
(B14) Local Differentially Private Regret Minimization in Reinforcement Learning    [PDF]
Vinith M. Suriyakumar, Nicolas Papernot, Anna Goldenberg, Marzyeh Ghassemi
(G1) Challenges of Differentially Private Prediction in Healthcare Settings    [arxiv]
Brian Knott, Shobha Venkataraman, Awni Hannun, Shubho Sengupta, Mark Ibrahim, Laurens van der Maaten
(G3) CrypTen: Secure Multi-Party Computation Meets Machine Learning    [PDF]
Xu Zheng, Nicholas McCarthy, Jeremiah Hayes
(G4) Network Generation with Differential Privacy    [PDF]
Ji Gao, Sanjam Garg, Mohammad Mahmoody, Prashant Nalini Vasudevan
(G5) Data Leakage in the Context of Machine Unlearning    [PDF]
Fatemehsadat Mireshghallah, Huseyin A. Inan, Marcello Hasegawa, Victor Ruhle, Robert Sim
(G7) Privacy Regularization: Joint Privacy-Utility Optimization in Language Models   
Fatemehsadat Mireshghallah, Mohammadkazem Taram, Ali Jalali, Ahmed Youssef, Dean Tullsen, Hadi Esmaeilzadeh
(G8) A Principled Approach to Learning Stochastic Representations for Privacy in Deep Neural Inference    [PDF]
Joonas Jälkö, Lukas Prediger, Antti Honkela, Samuel Kaski
(G9) Twinify: A software package for differentially private data release    [PDF]
Edwige Cyffers, Aurélien Bellet
(G10) Privacy Amplification by Decentralization    [arxiv]
Antti Koskela, Joonas Jälkö, Lukas Prediger, Antti Honkela
(G11) Tight Approximate Differential Privacy for Discrete-Valued Mechanisms Using FFT    [arxiv]
Tejas Kulkarni, Joonas Jälkö, Antti Koskela, Antti Honkela, Samuel Kaski
(G12) Differentially Private Bayesian Inference For GLMs    [arxiv]
Mikko Heikkilä, Antti Koskela, Kana Shimizu, Samuel Kaski, Antti Honkela
(G13) Differentially private cross-silo federated learning    [arxiv]
Fabian Boemer, Rosario Cammarota, Daniel Demmler, Thomas Schneider, Hossein Yalame
(G14) MP2ML: A Mixed-Protocol Machine Learning Framework for Private Inference    [PDF]
Rahul Rachuri, Daniel Escudero, Matthew Jagielski, Peter Scholl
(O1) Adversarial Attacks and Countermeasures on Private Training in MPC    [PDF]
Andrew Law, Chester Leung, Rishabh Poddar, Raluca Ada Popa, Chenyu Shi, Octavian Sima, Chaofan Yu, Xingmeng Zhang, Wenting Zheng
(O2) Data-oblivious training for XGBoost models    [arxiv]
Théo Jourdan, Antoine Boutet, Carole Frindel, Sébastien Gambs, Rosin Claude Ngueveu
(O3) DYSAN: Dynamically sanitizing motion sensor data against sensitive inferences through adversarial networks    [PDF]
Bargav Jayaraman, Lingxiao Wang, Katherine Knipmeyer, Quanquan Gu, David Evans
(O4) Revisiting Membership Inference Under Realistic Assumptions    [arxiv]
Nick Angelou, Ayoub Benaissa, Bogdan Cebere, William Clark, Adam James Hall, Michael A. Hoeh, Daniel Liu, Pavlos Papadopoulos, Robin Roehm, Robert Sandmann, Phillipp Schoppmann, Tom Titcombe
(O5) Asymmetric Private Set Intersection with Applications to Contact Tracing and Private Vertical Federated Machine Learning    [arxiv]
Antonious Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, Ananda Theertha Suresh
Shuffled Model of Federated Learning: Privacy, Accuracy, and Communication Trade-offs (contributed talk)    [arxiv]
Tianshi Cao, Alex Bie, Karsten Kreis, Sanja Fidler
(O10) Differentially Private Generative Models Through Optimal Transport    [PDF]
Ilaria Chillotti, Marc Joye, Pascal Paillier
(O11) New Challenges for Fully Homomorphic Encryption    [PDF]
Wenlin Chen, Samuel Horváth, Peter Richtárik
(O13) Optimal Client Sampling for Federated Learning    [arxiv]
Vasisht Duddu, Antoine Boutet, Virat Shejwalkar
(O15) Quantifying Privacy Leakage in Graph Embedding    [arxiv]
Peizhao Hu, Asma Aloufi, Adam Caulfield, Kim Laine, Kristin Lauter
(P1) SparkFHE: Distributed Dataflow Framework with Fully Homomorphic Encryption    [PDF]
Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Thao Nguyen
(P2) Robust and Private Learning of Halfspaces    [arxiv]
Harsha Nori, Zhiqi Bu, Judy Shen, Rich Caruana, Janardhan Kulkarni
(P4) Accuracy, Interpretability and Differential Privacy via Explainable Boosting    [PDF]
Chenghong Wang, David Pujol, Yaping Zhang, Johes Bater, Matthew Lentz, Ashwin Machanavajjhala, Kartik Nayak, Lavanya Vasudevan, Jun Yang
Poirot: Private Contact Summary Aggregation (contributed talk) [PDF]   
Abhishek Singh, Vivek Sharma, Ayush Chopra, Praneeth Vepakomma, Ramesh Raskar
(P8) Dynamic Channel Pruning for Privacy    [PDF]
Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurta, Florian Tramèr
On the (Im)Possibility of Private Machine Learning through Instance Encoding (contributed talk)    [arxiv]
Chung-Wei Weng, Yauhen Yakimenka, Hsuan-Yin Lin, Eirik Rosnes, Joerg Kliewer
(P9) Generative Adversarial User Privacy in Lossy Single-Server Information Retrieval    [arxiv]
Vasisht Duddu, Virat Shejwalkar, Antoine Boutet
(P10) Privacy Risks in Embedded Deep Learning    [arxiv]
Nurislam Tursynbek, Aleksandr Petiushko, Ivan Oseledets
(P11) Robustness Threats of Differential Privacy    [arxiv]
Pratyush Maini, Mohammad Yaghini, Nicolas Papernot
(P12) Dataset Inference: Ownership Resolution in Machine Learning    [PDF]
Anshul Aggarwal, Trevor Carlson, Reza Shokri, Shruti Tople
(P13) SOTERIA: In Search of Efficient Neural Networks for Private Inference    [PDF]
James Henry Bell, K. A. Bonawitz, Adrià Gascón, Tancrède Lepoint, Mariana Raykova
(P14) Secure Single-Server Aggregation with (Poly)Logarithmic Overhead    [arxiv]

Organization


Workshop organizers

  • Borja Balle (DeepMind)
  • James Bell (The Alan Turing Institute)
  • Aurélien Bellet (Inria)
  • Kamalika Chaudhuri (University of California, San Diego)
  • Adria Gascon (Google)
  • Antti Honkela (University of Helsinki)
  • Antti Koskela (University of Helsinki)
  • Casey Meehan (University of California, San Diego)
  • Olya Ohrimenko (University of Melbourne)
  • Mijung Park (MPI Tuebingen)
  • Mariana Raykova (Google)
  • Mary Anne Smart (University of California, San Diego)
  • Yu-Xiang Wang (University of California, Santa Barbara)
  • Adrian Weller (Alan Turing Institute & Cambridge)

Program Committee

  • Pauline Anthonysamy (Google)
  • Kallista Bonawitz (Google)
  • Mark Bun (Boston University)
  • Graham Cormode (University of Warwick)
  • Rachel Cummings (Georgia Institute of Technology)
  • Morten Dahl (Cape Privacy)
  • Martine De Cock (University of Washington)
  • Christos Dimitrakakis (Chalmers University of Technology)
  • Matt Fredrikson (Carnegie Mellon University)
  • Irene Giacomelli ( ISI Foundation)
  • Abhradeep Guha Thakurta (Google)
  • Jamie Hayes (UCL)
  • Stratis Ioannidis (Northeastern University)
  • Matthew Jagielski (Northeastern University)
  • Peter Kairouz (Google)
  • Gautam Kamath (University of Waterloo)
  • Marcel Keller (Data61)
  • Niki Kilbertus (Cambridge University)
  • Nadin Kokciyan (University of Edinburgh)
  • Ágnes Kiss (Technical University of Darmstadt)
  • Kim Laine (Microsoft Research)
  • Tancrède Lepoint (Google)
  • Audra McMillan (Apple)
  • Peihan Miao (University of Illinois at Chicago)
  • Richard Nock (Data61 & Australian National University)
  • Catuscia Palamidessi (École Polytechnique & Inria)
  • Jan Ramon (Inria)
  • Anand Sarwate (Rutgers University)
  • Peter Scholl (Aarhus University)
  • Phillipp Schoppmann (Humboldt University of Berlin)
  • Or Sheffet (University of Alberta)
  • Kana Shimizu (Waseda University)
  • Nigel Smart (KU Leuven)
  • Adam Smith (Boston University)
  • Congzheng Song (Cornell University)
  • Thomas Steinke (IBM)
  • Kunal Talwar (Apple)
  • Andrew Trask (Deepmind & Google)

Sponsors


Previous Editions