Virtual ACM CCS 2021 Workshop
November 19, 2021
Registration
Virtual Platform
This one day workshop focuses on privacy preserving techniques for training, inference, and disclosure in large scale data analysis, both in the distributed and centralized settings. We have observed increasing interest of the Machine Learning (ML) community in leveraging cryptographic techniques such as Multi-Party Computation (MPC) and Homomorphic Encryption (HE) for privacy preserving training and inference, as well as Differential Privacy (DP) for disclosure. Simultaneously, the systems security and cryptography community has proposed various secure frameworks for ML. We encourage both theory and application-oriented submissions exploring a range of approaches, including
Submission deadline: July 22 August 1, 2021, 23:59 (Anywhere on Earth)
Notification of acceptance: September 16, 2021
Workshop: November 19, 2021
Submissions in the form of extended abstracts must be at most 4 pages long (not including references), using the double-column ACM format. We encourage submission of work that is new to the privacy-preserving machine learning community. Submissions should be anonymized. The workshop will not have formal proceedings, but authors of accepted abstracts can choose to have a link to a preprint or a PDF published on the workshop webpage. Authors of accepted papers are required to register for the workshop but can present their work remotely.
Submit Your AbstractThanks to our generous sponsors, we are able to provide a limited number of grants to cover the workshop registration fees of PPML attendees who have not received other support from CCS this year. To apply, please send an email to ppml2021@googlegroups.com with the subject “PPML21 Grant Application” including a half-page statement of purpose. Please create an account in the CCS online registration system, and include your user ID / email address in the application. The deadline for applications is November 5, 2021 (11:59pm AoE). The notifications will be sent by November 12. Please feel free to send us an email if you have any questions.
The workshop will be hosted in two blocks: BLOCK I accommodates Asia and Europe (morning) time zones, BLOCK II accommodates U.S. and Europe (evening) time zones. Unless otherwise noted, all listed times are CET (UTC+1).
BLOCK I, Asia/Europe | |
9:10–9:20 | Welcome & Introduction |
9:20–10:00 | Invited talk (1): Mireille Hildebrandt — PPML and the AI Act's Fundamental Rights Impact Assessment (FRIA) for ML Systems (video) |
Abstract
Bringing together the ML community with MPC, HE and DP communities should allow for pivotal awareness within the ML community of myriad security and privacy issues that may affect both the reliability of ML systems and the confidentiality of the information these systems process. In this talk I will discuss how reducing access to identifiable information may nevertheless increase the risk to other fundamental rights, notably those of non-discrimination, presumption of innocence and freedom of information. This will be followed by an analysis of the legal requirement of a fundamental rights impact assessment (FRIA), referring to the EU’s GDPR and the EU’s proposed AI Act. Speaker Bio: Hildebrandt is a Research Professor on ‘Interfacing Law and Technology’ at Vrije Universiteit Brussels (VUB), appointed by the VUB Research Council. She is co-Director of the Research Group on Law Science Technology and Society studies (LSTS) at the Faculty of Law and Criminology. She also holds the part-time Chair of Smart Environments, Data Protection and the Rule of Law at the Science Faculty, at the Institute for Computing and Information Sciences (iCIS) at Radboud University Nijmegen. Her research interests concern the implications of automated decisions, machine learning and mindless artificial agency for law and the rule of law in constitutional democracies. Hildebrandt has published 5 scientific monographs, 23 edited volumes or special issues, and well over 100 chapters and articles in scientific journals and volumes. She received an ERC Advanced Grant for her project on ‘Counting as a Human Being in the era of Computational Law’ (2019-2024), that funds COHUBICOL. In that context she is co-founder of the international peer reviewed Journal of Cross-Disciplinary Research in Computational Law, together with Laurence Diver (co-Editors in Chief are Virginia Dignum and Frank Pasquale). |
|
10:00–10:20 |
Interaction data are identifiable even across long periods of time
(video)
Ana-Maria Cretu (Imperial College London); Federico Monti (Twitter); Stefano Marrone (University of Naples Federico II); Xiaowen Dong (University of Oxford); Michael Bronstein, Yves-Alexandre de Montjoye (Imperial College London) |
Fine-grained records of people's interactions, both offline and online, are collected at a large scale. These data contain sensitive information about whom we meet, talk to, and when. We demonstrate here how people's interaction behavior is stable over long periods of time and can be used to identify individuals in anonymous datasets. Our attack learns the profile of an individual using geometric deep learning and triplet loss optimization. In a mobile phone metadata dataset of more than 40k people, it correctly identifies 52% of individuals based on their 2-hop interaction graph. We further show that the profiles learned by our method are stable over time and that 24% of people are still identifiable after 20 weeks, thus making identification a real risk in practice. Finally, we show that having access to more auxiliary data can improve the performance of the attack, albeit with decreasing returns. Our results provide strong evidence that disconnected and even re-pseudonymized interaction data can be linked together making them likely to be personal data under the European Union's General Data Protection Regulation.
|
|
10:20–10:30 | Break 10min |
10:30–11:10 | Invited talk (2): Benny Pinkas — Private Intersection Analytics for Machine Learning (video) |
Abstract:
Effective data analysis often relies on information from multiple sources, including private information that cannot be released by its owners. The challenge is to analyze the data effectively while protecting its privacy. This talk will provide an overview of efficient cryptographic protocols, some of them based on variants of private set intersection (PSI), that can be applied to privately analyze data. Speaker Bio: Benny Pinkas is the head of the Cyber Research Center at Bar Ilan University, Israel. He received his PhD from the Weizmann Institute in 2000. He has worked in the research labs of Intertrust Technologies, Hewlett-Packard, Google and VMware. His main research areas are cryptography, computer security and privacy, with a focus on secure computation. |
|
11:10–11:30 |
SIRNN: A Math Library for Secure RNN Inference
(video)
Deevashwer Rathee, Mayank Rathee (UC Berkeley); Rahul Kranti Kiran Goli, Divya Gupta, Rahul Sharma, Nishanth Chandran, Aseem Rastogi (Microsoft Research) |
Complex machine learning (ML) inference algorithms like recurrent neural networks (RNNs) use standard functions from math libraries like exponentiation, sigmoid, tanh, and reciprocal of square root. Although prior work on secure 2-party inference provides specialized protocols for convolutional neural networks (CNNs), existing secure implementations of these math operators rely on generic 2-party computation (2PC) protocols that suffer from high communication. We provide new specialized 2PC protocols for math functions that crucially rely on lookup-tables and mixed- bitwidths to address this performance overhead; our protocols for math functions communicate up to 423× less data than prior work. Furthermore, our math implementations are numerically precise, which ensures that the secure implementations preserve model accuracy of cleartext. We build on top of our novel protocols to build SiRnn, a library for end-to-end secure 2-party DNN inference, that provides the first secure implementations of an RNN operating on time series sensor data, an RNN operating on speech data, and a state-of-the-art ML architecture that combines CNNs and RNNs for identifying all heads present in images. Our evaluation shows that SiRnn achieves up to three orders of magnitude of performance improvement when compared to inference of these models using an existing state-of-the-art 2PC framework.
|
|
11:30–11:35 | Move to Gather |
11:35–12:35 | Poster Session |
BLOCK II, Europe/US | |
17:30–17:40 | Welcome (back) |
17:40–18:20 | Invited talk (3): Ilya Mironov — Federated Learning: To TEE or Not to TEE? (video) |
Abstract:
Cross-device Federated Learning (FL) is a distributed learning paradigm that promises to train high-quality models by leveraging data from massive client populations, while ensuring security and privacy of client data. A key component of FL protocols is secure aggregation of clients' updates, which can be implemented either by using traditional MPC techniques or by shifting some processing to a hardware-backed Trusted Execution Environment (TEE). We will discuss both approaches and their implications for supporting Internet-scale FL deployments. Speaker Bio: Ilya Mironov obtained his Ph.D. in cryptography from Stanford in 2003. In 2003-2014 he was a member of Microsoft Research (Silicon Valley Campus), where he contributed to early works on differential privacy. In 2015-2019 he worked in Google Brain. Since 2019 he has been part of Responsible AI (Meta Platforms, company previously known as Facebook) working on privacy-preserving machine learning. |
|
18:20–18:40 |
Canonical Noise and Private Hypothesis Tests
(video)
Jordan Awan (Purdue University); Salil Vadhan (Harvard University) |
In the setting of $f$-DP, we propose the concept canonical noise distribution (CND) which captures whether an additive privacy mechanism is tailored for a given $f$, and give a construction of a CND for an arbitrary tradeoff function $f$. We show that private hypothesis tests are intimately related to CNDs, allowing for the release of private $p$-values at no additional privacy cost as well as the construction of uniformly most powerful (UMP) tests for binary data. We apply our techniques to difference of proportions testing.
|
|
18:40–19:00 |
The Skellam Mechanism for Differentially Private Federated Learning
(video)
Naman Agarwal, Peter Kairouz, Ziyu Liu (Google Research) |
We introduce the multi-dimensional Skellam mechanism, a discrete differential privacy mechanism based on the difference of two independent Poisson random variables. To quantify its privacy guarantees, we analyze the privacy loss distribution via a numerical evaluation and provide a sharp bound on the Rényi divergence between two shifted Skellam distributions. While useful in both centralized and distributed privacy applications, we investigate how it can be applied in the context of federated learning with secure aggregation under communication constraints. Our theoretical findings and extensive experimental evaluations demonstrate that the Skellam mechanism provides the same privacy-accuracy trade-offs as the continuous Gaussian mechanism, even when the precision is low. More importantly, Skellam is closed under summation and sampling from it only requires sampling from Poisson – an efficient routine that ships with all machine learning and data analysis software packages. These features, along with its discrete nature and competitive privacy-accuracy trade-offs, make it an attractive alternative to the newly introduced discrete Gaussian mechanism.
|
|
19:00–19:30 | Break 30min |
19:30–20:10 | Invited talk (4): Aleksandra Korolova — Auditing the Hidden Societal Impacts of Ad Delivery Algorithms, with Implications to Privacy (video) |
Abstract:
Although targeted advertising has been touted as a way to give advertisers a choice in who they reach, increasingly, ad delivery algorithms designed by the ad platforms are invisibly refining those choices. In this talk, I will present our methodology for "black-box" auditing of the role of ad delivery algorithms in shaping who sees job and political ads using only the tools and data accessible to any advertiser. I will demonstrate that a platform's algorithmic choices can lead to skew in delivery of job ads along gender and racial lines, even when such skew was not intended by the advertiser and is not justified by differences in qualifications. Furthermore, I will show that a platform's choices shape the political ad delivery by hindering campaigns' ability to reach ideologically diverse voters. I will conclude by discussing the implications of our findings for the necessity of third-party auditing and the open questions of how to enable such auditing while preserving privacy. Based on joint work with Muhammad Ali, Miranda Bogen, John Heidemann, Basileal Imana, Alan Mislove, Aaron Rieke, Piotr Sapiezynski. Speaker Bio: Aleksandra Korolova is a WiSE Gabilan Assistant Professor of Computer Science at USC, where she studies societal impacts of algorithms and develops algorithms that enable data-driven innovations while preserving privacy and fairness. Prior to joining USC, Aleksandra was a research scientist at Google and a Computer Science Ph.D. student at Stanford. Aleksandra is a recipient of the 2020 NSF CAREER award, a co-winner of the 2011 PET Award for outstanding research in privacy enhancing technologies for exposing privacy violations of microtargeted advertising and a runner-up for the 2015 PET Award for RAPPOR, the first commercial deployment of differential privacy. Aleksandra's most recent research, on discrimination in ad delivery, has received the CSCW Honorable Mention Award and Recognition of Contribution to Diversity and Inclusion, was a runner-up for the WWW Best Student Paper Award, and was invited for a briefing for Members of the House Financial Services Committee. |
|
20:10–20:30 |
NeuraCrypt is not private
(video)
Nicholas Carlini (Google); Sanjam Garg (University of California, Berkeley and NTT Research); Somesh Jha (University of Wisconsin); Saeed Mahloujifar (Princeton); Mohammad Mahmoody (University of Virginia); Florian Tramèr (Stanford University) |
NeuraCrypt (Yara et al. arXiv 2021) is an algorithm that converts a sensitive dataset to an encoded dataset so that (1) it is still possible to train machine learning models on the encoded data, but (2) an adversary who has access only to the encoded dataset can not learn much about the original sensitive dataset. We break NeuraCrypt's privacy claims, by perfectly solving the authors' public challenge, and by showing that NeuraCrypt does not satisfy the formal privacy definitions posed in the original paper. Our attack consists of a series of boosting steps that, coupled with various design flaws, turns a 1% attack advantage into a 100% complete break of the scheme.
|
|
20:30–20:50 |
What else is leaked when eavesdropping federated learning?
(video)
Chuan Xu, Giovanni Neglia (Inria) |
In this paper, we initiate the study of local model reconstruction attacks for federated learning, where a honest-but-curious adversary eavesdrops the messages exchanged between the client and the server and reconstructs the local model of the client. The success of this attack enables better performance of other known attacks, such as the membership attack, attribute inference attacks, etc. We provide analytical guarantees for the success of this attack when training a linear least squares problem with full batch size and arbitrary number of local steps. One heuristic is proposed to generalize the attack to other machine learning problems. Experiments are conducted on logistic regression tasks, showing high reconstruction quality, especially when clients' datasets are highly heterogeneous (as it is common in federated learning)
|
|
20:50–21:10 |
FHE-Friendly Distillation of Decision Tree Ensembles for Efficient Encrypted Inference
(video)
Karthik Nandakumar (Mohamed Bin Zayed University of Artificial Intelligence); Kanthi Sarpatwar (IBM T. J. Watson Research Center); Nalini Ratha (University of Buffalo); Sharath Pankanti (Microsoft); Roman Vaculin, Karthikeyan Shanmugam, James T Rayfield (IBM Research) |
Data privacy concerns often limit the use of cloud-based machine learning services for processing sensitive personal data. While fully homomorphic encryption (FHE) offers a potential solution by enabling computations on encrypted data, the challenge is to obtain accurate machine learning models that work efficiently within FHE limitations. Though deep neural networks have been very successful in many applications, decision tree ensembles are still considered the state-of-the-art for inference on tabular data. Existing approaches for encrypted inference based on decision trees simply replace hard comparison operations with soft comparators at the cost of accuracy. In this work, we propose a framework to distill knowledge extracted by decision tree ensembles to shallow neural networks (referred to as FDNets) that are highly conducive to encrypted inference. The proposed FDNets are FHE-friendly because they are obtained by searching for the best multilayer perceptron (MLP) architecture that minimizes the accuracy loss while operating within the given depth constraints. Furthermore, the FDNets can be trained using only synthetic data sampled from the training data distribution without the need for accessing the original training data. Extensive experiments on real-world datasets demonstrate that FDNets are highly scalable and can perform efficient inference on batched encrypted (134 bits of security) data with amortized time in milliseconds.
|
|
21:10–21:15 | Move to Gather |
21:15–22:15 | Poster Session |