Privacy Preserving Machine Learning (CCS 2019 Workshop)

Schedule

8:50	Welcome
9:00	Invited talk: Kobbi Nissim — Legal Theorems of Privacy
Significant gaps between legal and technical thinking around data privacy make it hard to exactly understand how legal standards apply to economic mechanisms that use personal information. Such mechanisms may apply technical solutions constructed under frameworks such as differential privacy and k-anonymity. However, there is a lot of uncertainty regarding how these mathematical concepts meet legal standards. As a result, arguments that mechanisms apply sufficient technical privacy measures for satisfying legal privacy often lack rigor, and their conclusions are uncertain. The uncertainty is exacerbated by a litany of successful privacy attacks on privacy measures thought to meet legal expectations but then shown to fall short of doing so. We examine strategies for introducing mathematical rigor into the analysis, so as to make formal claims and prove “legal theorems” that technical privacy measures meet legal expectations. For that, we explore some of the gaps between these two very different approaches, and present initial strategies towards bridging these gaps considering examples from US and EU law. Based on joint works with Aaron Bembenek, Mark Bun, Aloni Cohen, Marco Gaboardi, Urs Gasser, David R. O’Brien, Thomas Steinke, Alexandra Wood, and Salil Vadhan.
9:45	Secure parallel computation on national scale volumes of data (contributed talk) Sahar Mazloom, Le Phi Hung, Samuel Ranellucci and S. Dov Gordon
In this work, we revisit secure computation of graph parallel algorithms, simultaneously leveraging all three of the following advances: we assume four computation servers (with an honest majority, and one malicious corruption), allow differentially private leakage during computation, and, exploiting the parallelism that this affords, we construct an MPC protocol that can perform at national scales. Concretely, we compute histograms on 300 million inputs in 4.18 minutes, and we perform sparse matrix factorization, which is used in recommendation systems, on 20 million inputs in under 6 minutes. These problems have broad, real-world applications, and, at this scale, we could imagine supporting the CensusBureau, or a large company such as Amazon. For comparison, the largest experiments in GraphSC [6] and OblivGraph [18] had 1M inputs, and required 13 hours and 2 hours of runtime, respectively, while using 4 times the number of processors that we employ. End-to-end, our construction is 320X faster than OblivGraph.
10:05	Poster Session and Coffee Break
10:45	Invited talk: Rachel Cummings — Differential Privacy for Dynamic Databases
Privacy concerns are becoming a major obstacle to using data in the ways we want. How can data scientists make use of potentially sensitive data, while providing rigorous privacy guarantees to the individuals who provided data? Over the last decade, differential privacy has emerged as the de facto gold standard of privacy preserving data analysis. Differential privacy ensures that an algorithm does not overfit to the individuals in the database by guaranteeing that if any single entry in the database were to be changed, then the algorithm would still have approximately the same distribution over outputs. In this talk, we will focus on recent advances in differential privacy for dynamic databases, where the content of the database evolves over time as new data are acquired. First, we will see how to extend differentially private algorithms for static databases to the dynamic setting, with relatively small loss in the privacy-accuracy tradeoff. Next, we see algorithms for privately detecting changes in data composition. We will conclude with a discussion of open problems in this space, including the use of differential privacy for other types of data dynamism. (based on joint works with Sara Krehbiel, Kevin Lai, Yuliia Lut, Yajun Mei, Uthaipon (Tao) Tantipongpipat, Rui Tuo, and Wanrong Zhang.)
11:30	Garbled Neural Networks are Practical (contributed talk) Marshall Ball, Brent Carmer, Tal Malkin, Mike Rosulek and Nichole Schimanski
We show that garbled circuits offer a practical choice for secure evaluation of neural network classifiers, comparable with complex, specialized protocols using less robust assumptions, many rounds of interaction, and/or tailor-made neural networks. In particular, we develop a scheme for garbling ``off the shelf'' pre-trained neural networks, where the only model preprocessing required is a mild discretization step as opposed to requiring a specialized SFE-friendly model to be independently trained. Moreover, as our solution is a garbling scheme, it inherits a much more diverse range of applications than non-garbling-based solutions, perhaps most notably, efficient compilers for the malicious setting. At the protocol level, we start with the garbling scheme of Ball, Malkin, and Rosulek (ACM CCS 2016) for arithmetic circuits and introduce new optimizations for modern neural network activation functions. We develop fancygarbling, the first implementation of the BMR16 garbling scheme along with our new optimizations, as part of heavily optimized garbled-circuits tool that is driven by a TensorFlow classifier description. We evaluate our constructions on a wide range of neural networks. We find that our approach is up to 100x more efficient than straight-forward boolean garbling. It is also roughly 40% more efficient than DeepSecure (Rouhani et al., DAC 2018), a recent garbled-circuit-based approach for secure neural network evaluation, which incorporates significant optimization techniques for boolean circuits. Furthermore, our approach provides competitive performance tradeoffs (efficiency and latency vs. communication) also when compared with non-garbled-circuit approaches.
11:50	Learning Rate Adaptation for Federated and Differentially Private Learning (contributed talk) Antti Koskela and Antti Honkela
We propose an algorithm for the adaptation of the learning rate for stochastic gradient descent (SGD) that avoids the need for validation set use. The idea for the adaptiveness comes from the technique of extrapolation: to get an estimate for the error against the gradient flow which underlies SGD, we compare the result obtained by one full step and two half-steps. The algorithm is applied in two separate frameworks: federated and differentially private learning. Using examples of deep neural networks we empirically show that the method works robustly in the case of federated learning unlike commonly used optimisation methods. We also show that the adaptive algorithm is competitive with state-of-the-art hyperparameter search methods and commonly used optimisation methods for differentially privately training.
12:10	Lightning Talks
1. Nitin Agrawal, Ali Shahin Shamsabadi, Matthew Kusner and Adrià Gascón. QUOTIENT: Secure Two-Party Neural Network Training and Prediction via Oblivious Transfer 2. Yuantian Miao, Ben Zi Hao Zhao, Minhui Xue, Chen Chao, Lei Pan, Jun Zhang, Dali Kaafar and Yang Xiang. The Audio Auditor: Participant-Level Membership Inference in Internet of Things Voice Services 3. Harsh Chaudhari, Ashish Choudhury, Arpita Patra and Ajith Suresh. ASTRA: High Throughput 3PC over Rings with Application to Secure Prediction 4. Brendan Avent, Javier Gonzalez, Tom Diethe, Andrei Paleyes and Borja Balle. Automatic Discovery of Privacy-Utility Pareto Fronts 5. Marco Romanelli, Konstantinos Chatzikokolakis and Catuscia Palamidessi. Optimal Obfuscation Mechanisms via Machine Learning and Applications to Location Privacy 6. Niek J. Bouman and Niels de Vreede. A Practical Approach to the Secure Computation of the Moore-Penrose Pseudoinverse over the Rationals 7. James Bell, Aurélien Bellet, Adria Gascon and Tejas Kulkarni. Private Protocols for -statistics in the Local Model and Beyond 8. Mark Abspoel, Niek J. Bouman, Berry Schoenmakers and Niels de Vreede. Fast Secure Comparison for Medium-Sized Integers and Its Application in Binarized Neural Networks 9. Devin Reich, Ariel Todoki, Rafael Dowsley, Martine De Cock and Anderson Nascimento. Secret Sharing based Private Text Classification 10. Qingrong Chen, Chong Xiang, Minhui Xue, Bo Li, Nikita Borisov, Dali Kaafar and Haojin Zhu. Differentially Private Data Sharing: Sharing Models versus Sharing Data 11. Antti Koskela, Joonas Jälkö and Antti Honkela. Computing Exact Guarantees for Differential Privacy 12. Thijs Veugen, Bart Kamphorst, Marie Beth van Egmond and Natasja van de L'Isle. Privacy-Preserving Coupling of Vertically-Partitioned Databases and Subsequent Training with Gradient Descent 13. Sebastian P. Bayerl, Ferdinand Brasser, Christoph Busch, Tommaso Frassetto, Patrick Jauernig, Jascha Kolberg, Andreas Nautsch, Korbinian Riedhammer, Ahmad-Reza Sadeghi, Thomas Schneider, Emmanuel Stapf, Amos Treiber and Christian Weinert. Privacy-Preserving Speech Processing via STPC and TEEs 14. Ranya Aloufi, Hamed Haddadi and David Boyle. Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants 15. Lukas Burkhalter, Alexander Viand, Anwar Hithnawi and Hossein Shafagh. Robust Secure Aggregation for Privacy-Preserving Federated Learning with Adversaries 16. Anders Dalskov, Daniel Escudero and Marcel Keller. Secure Evaluation of Quantized Neural Networks 17. Mohammad Yaghini, Bogdan Kulynych and Carmela Troncoso. Disparate Vulnerability: on the Unfairness of Privacy Attacks Against Machine Learning
12:40	Poster Session and Lunch Break
14:00	Invited talk: Vitaly Shmatikov — Unwanted Machine Learning
Modern machine learning models exhibit amazing accuracy on tasks from image classification to natural-language processing, but accuracy does not tell the entire story of what these models have learned. Does a model memorize and leak its training data? Does it “accidentally" learn privacy-violating tasks uncorrelated with its training objective? Can it hide a backdoor introduced by an adversary? All of these are examples of unwanted learning, which we need to understand and mitigate in order to solve security and privacy problems in today's AI.
14:45	On Inferring Training Data Attributes in Machine Learning Models (contributed talk) Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Raghav Bhaskar and Mohamed Ali Kaafar
A number of recent works have demonstrated that API access to machine learning models leaks information about the dataset records used to train the models. Further, the work of [9] shows that such membership inference attacks (MIAs) may be sufficient to construct a stronger breed of attribute inference attacks (AIAs), which given a partial view of a record can guess the missing attributes. In this work, we show (to the contrary) that MIA may not be sufficient to build a successful AIA. This is because the latter requires the ability to distinguish between similar records (differing only in a few attributes), and, as we demonstrate, the current breed of MIA are unsuccessful in distinguishing member records from similar non-member records. We thus propose a relaxed notion of AIA, whose goal is to only approximately guess the missing attributes and argue that such an attack is more likely to be successful, if MIA is to be used as a subroutine for inferring training record attributes.
15:05	Poster Session and Coffee Break
15:45	Invited talk: Kim Laine — From Homomorphic Encryption to Private AI: Successes, Challenges, and Opportunities
In this talk the audience will learn about modern homomorphic encryption and how it can be used to bring strong privacy guarantees to specific machine learning applications. We will also discuss limitations of homomorphic encryption in general, and specifically in the context of private machine learning, and get a glimpse of how some new exciting research directions may help resolve these challenges in the future.
16:30	Efficient Secure Ridge Regression from Randomized Gaussian Elimination (contributed talk) Frank Blom, Niek J. Bouman, Berry Schoenmakers and Niels de Vreede
In this paper we present a practical protocol for secure ridge regression. We develop the necessary secure linear algebra tools, using only basic arithmetic over prime fields. In particular, we will show how to solve linear systems of equations and compute matrix inverses efficiently, using appropriate secure random self-reductions of these problems. The distinguishing feature of our approach is that the use of secure fixed-point arithmetic is avoided entirely, while circumventing the need for rational reconstruction at any stage as well. We demonstrate the potential of our protocol in a standard setting for information-theoretically secure multiparty computation, tolerating a dishonest minority of passively corrupt parties. Using the MPyC framework, which is based on threshold secret sharing over finite fields, we show how to handle large datasets efficiently, achieving practically the same root-mean-square errors as Scikit-learn. Moreover, we do not assume that any (part) of the datasets is held privately by any of the parties, which makes our protocol much more versatile than existing solutions.
16:50	A verification framework for secure machine learning (contributed talk) Prasad Naldurg and Karthikeyan Bhargavan
We propose a programming and verification framework to help developers build distributed software applications using composite homomorphic encryption (and secure multi-party computation protocols), to implement secure machine learning and classification over private data. With our framework, a developer can prove that the application code is functionally correct, that it correctly composes the various cryptographic schemes it uses, and that it does not accidentally leak any secrets (via side-channels, for example.) Our end-to-end solution results in verified and efficient implementations of state-of-the-art secure privacy-preserving learning and classification techniques.
17:10	Wrap-up

Accepted Papers

Marshall Ball, Brent Carmer, Tal Malkin, Mike Rosulek and Nichole Schimanski.
Garbled Neural Networks are Practical

We show that garbled circuits offer a practical choice for secure evaluation of neural network classifiers, comparable with complex, specialized protocols using less robust assumptions, many rounds of interaction, and/or tailor-made neural networks. In particular, we develop a scheme for garbling ``off the shelf'' pre-trained neural networks, where the only model preprocessing required is a mild discretization step as opposed to requiring a specialized SFE-friendly model to be independently trained. Moreover, as our solution is a garbling scheme, it inherits a much more diverse range of applications than non-garbling-based solutions, perhaps most notably, efficient compilers for the malicious setting. At the protocol level, we start with the garbling scheme of Ball, Malkin, and Rosulek (ACM CCS 2016) for arithmetic circuits and introduce new optimizations for modern neural network activation functions. We develop fancygarbling, the first implementation of the BMR16 garbling scheme along with our new optimizations, as part of heavily optimized garbled-circuits tool that is driven by a TensorFlow classifier description. We evaluate our constructions on a wide range of neural networks. We find that our approach is up to 100x more efficient than straight-forward boolean garbling. It is also roughly 40% more efficient than DeepSecure (Rouhani et al., DAC 2018), a recent garbled-circuit-based approach for secure neural network evaluation, which incorporates significant optimization techniques for boolean circuits. Furthermore, our approach provides competitive performance tradeoffs (efficiency and latency vs. communication) also when compared with non-garbled-circuit approaches.

Frank Blom, Niek J. Bouman, Berry Schoenmakers and Niels de Vreede.
Efficient Secure Ridge Regression from Randomized Gaussian Elimination

In this paper we present a practical protocol for secure ridge regression. We develop the necessary secure linear algebra tools, using only basic arithmetic over prime fields. In particular, we will show how to solve linear systems of equations and compute matrix inverses efficiently, using appropriate secure random self-reductions of these problems. The distinguishing feature of our approach is that the use of secure fixed-point arithmetic is avoided entirely, while circumventing the need for rational reconstruction at any stage as well. We demonstrate the potential of our protocol in a standard setting for information-theoretically secure multiparty computation, tolerating a dishonest minority of passively corrupt parties. Using the MPyC framework, which is based on threshold secret sharing over finite fields, we show how to handle large datasets efficiently, achieving practically the same root-mean-square errors as Scikit-learn. Moreover, we do not assume that any (part) of the datasets is held privately by any of the parties, which makes our protocol much more versatile than existing solutions.

Nitin Agrawal, Ali Shahin Shamsabadi, Matthew Kusner and Adrià Gascón.
QUOTIENT: Secure Two-Party Neural Network Training and Prediction via Oblivious Transfer

Recently, there has been a wealth of effort devoted to the design of protocols for secure machine learning algorithms. In particular, much of this is aimed at ensuring predictions from highly-accurate deep neural network (NN) models are secure. However, as NNs are trained on data, a key question is how such models can be trained securely. The few prior works on secure NN training have focused either on designing custom protocols for existing training algorithms, or on developing tailored training algorithms and then applying generic secure protocols. In this work, we propose to simultaneously design training algorithms alongside a secure protocol for computing that algorithm, incorporating optimizations on both fronts. We present QUOTIENT, a new method for discretized training of deep neural networks designed to be evaluated in secure computation, along with a secure two-party protocol for it. QUOTIENT incorporates important components of state-of-the-art neural network training such as layer normalization and adaptize learning rates, and improves upon the state-of-the-art in neural network training in two-party computation. Compared to the state-of-the prior work, we obtain an improvement of 50X in time and 6 in accuracy. Additionally, our method is the first practical secure 2PC framework for neural network training over WAN.

Yuantian Miao, Ben Zi Hao Zhao, Minhui Xue, Chen Chao, Lei Pan, Jun Zhang, Dali Kaafar and Yang Xiang.
The Audio Auditor: Participant-Level Membership Inference in Internet of Things Voice Services

Voice interfaces and assistants implemented by various services have become increasingly sophisticated, powered by increased availability of data. However, users' audio data needs to be guarded while enforcing data-protection regulations, such as the GDPR law and the COPPA law. To check the unauthorized use of audio data, we propose an audio auditor for users to audit speech recognition models. Specifically, users can check whether their audio recordings were used as a member of the model's training dataset or not. In this paper, we focus our work on a DNN-HMM-based automatic speech recognition model over the TIMIT audio data. As a proof-of-concept, the success rate of participant-level membership inference can reach up to 90\% with eight audio samples per user, resulting in an audio auditor.

Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Raghav Bhaskar and Mohamed Ali Kaafar.
On Inferring Training Data Attributes in Machine Learning Models

A number of recent works have demonstrated that API access to machine learning models leaks information about the dataset records used to train the models. Further, the work of [9] shows that such membership inference attacks (MIAs) may be sufficient to construct a stronger breed of attribute inference attacks (AIAs), which given a partial view of a record can guess the missing attributes. In this work, we show (to the contrary) that MIA may not be sufficient to build a successful AIA. This is because the latter requires the ability to distinguish between similar records (differing only in a few attributes), and, as we demonstrate, the current breed of MIA are unsuccessful in distinguishing member records from similar non-member records. We thus propose a relaxed notion of AIA, whose goal is to only approximately guess the missing attributes and argue that such an attack is more likely to be successful, if MIA is to be used as a subroutine for inferring training record attributes.

Harsh Chaudhari, Ashish Choudhury, Arpita Patra and Ajith Suresh.
ASTRA: High Throughput 3PC over Rings with Application to Secure Prediction

In this work, we present concretely-efficient protocols for secure $3$-party computation (3PC) over a ring of integers modulo $2^{\ell}$ tolerating one corruption, both with semi-honest and malicious security. Our protocol requires an online communication of $2$ and $4$ ring elements per multiplication gate in the semi-honest and malicious setting respectively. We apply our techniques from $3$PC in the regime of secure server-aided machine-learning (ML) inference for a range of prediction functions-- linear regression, linear SVM regression, logistic regression, and linear SVM classification.

Antti Koskela and Antti Honkela.
Learning Rate Adaptation for Federated and Differentially Private Learning

We propose an algorithm for the adaptation of the learning rate for stochastic gradient descent (SGD) that avoids the need for validation set use. The idea for the adaptiveness comes from the technique of extrapolation: to get an estimate for the error against the gradient flow which underlies SGD, we compare the result obtained by one full step and two half-steps. The algorithm is applied in two separate frameworks: federated and differentially private learning. Using examples of deep neural networks we empirically show that the method works robustly in the case of federated learning unlike commonly used optimisation methods. We also show that the adaptive algorithm is competitive with state-of-the-art hyperparameter search methods and commonly used optimisation methods for differentially privately training.

Brendan Avent, Javier Gonzalez, Tom Diethe, Andrei Paleyes and Borja Balle.
Automatic Discovery of Privacy-Utility Pareto Fronts

Differential privacy is a mathematical framework for privacy-preserving data analysis. Changing the hyperparameters of a differentially private algorithm allows one to trade off privacy and utility in a principled way. Quantifying this trade-off in advance is essential to decision-makers tasked with deciding how much privacy can be provided in a particular application while keeping acceptable utility. Analytical utility guarantees offer a simple tool to reason about this trade-off, but are generally only available for relatively simple problems. For more complex tasks, such as training neural networks under differential privacy, the utility achieved by a given algorithm can only be measured empirically. This paper presents a Bayesian optimization methodology for efficiently characterizing the privacy--utility trade-off of any differentially private algorithm using only empirical measurements of its utility. The versatility of our method is illustrated on a number of machine learning tasks involving multiple models, optimizers, and datasets.

Marco Romanelli, Konstantinos Chatzikokolakis and Catuscia Palamidessi.
Optimal Obfuscation Mechanisms via Machine Learning and Applications to Location Privacy

We consider the problem of obfuscating sensitive information while preserving utility, and we propose a machine-learning approach inspired by the generative adversarial networks paradigm. The idea is to set up two nets: the generator, that tries to produce an optimal obfuscation mechanism to protect the data, and the classifier, that tries to de-obfuscate the data. By letting the two nets compete against each other, the mechanism improves its degree of protection, until an equilibrium is reached. We apply our method to the case of location privacy, and we perform experiments on synthetic data and on real data from the Gowalla dataset. We evaluate the privacy of the mechanism not only by its capacity to defeat the classifier, but also in terms of the Bayes error, which represents the strongest possible adversary. We compare the privacy-utility tradeoff of our method with that of the planar Laplace mechanism used in geo-indistinguishability, showing favorable results. Like the Laplace mechanism, our system can be deployed at the user end for protecting his location in case of sporadic uses of location-based services.

Sahar Mazloom, Le Phi Hung, Samuel Ranellucci and S. Dov Gordon.
Secure parallel computation on national scale volumes of data

In this work, we revisit secure computation of graph parallel algorithms, simultaneously leveraging all three of the following advances: we assume four computation servers (with an honest majority, and one malicious corruption), allow differentially private leakage during computation, and, exploiting the parallelism that this affords, we construct an MPC protocol that can perform at national scales. Concretely, we compute histograms on 300 million inputs in 4.18 minutes, and we perform sparse matrix factorization, which is used in recommendation systems, on 20 million inputs in under 6 minutes. These problems have broad, real-world applications, and, at this scale, we could imagine supporting the CensusBureau, or a large company such as Amazon. For comparison, the largest experiments in GraphSC [6] and OblivGraph [18] had 1M inputs, and required 13 hours and 2 hours of runtime, respectively, while using 4 times the number of processors that we employ. End-to-end, our construction is 320X faster than OblivGraph.

Niek J. Bouman and Niels de Vreede.
A Practical Approach to the Secure Computation of the Moore-Penrose Pseudoinverse over the Rationals

We devise an efficient and data-oblivious algorithm for solving a bounded integral linear system of arbitrary rank over the rational numbers via the Moore-Penrose pseudoinverse, using finite-field arithmetic. This particular problem setting stems from our goal to run the algorithm as a secure multiparty computation (MPC). Beyond MPC, our algorithm could be valuable in other scenarios, like secure enclaves in CPUs, where data-obliviousness is crucial for protecting secrets. We compute the Moore-Penrose inverse over a finite field of sufficiently large order, so that we can recover the rational solution from the solution over the finite field. Previous work by Cramer, Kiltz and Padro (CRYPTO 2007) proposes a constant-rounds protocol for computing the Moore-Penrose pseudoinverse over a finite field. The asymptotic complexity (by which we mean the number of secure multiplications) of their solution is $O(m^4 + n^2 m)$, where $m$ and $n$, $m\leq n$, are the dimensions of the linear system. To reduce the number of secure multiplications, we sacrifice the constant-rounds property and propose a protocol for computing the Moore-Penrose pseudoinverse over the rational numbers in a linear number of rounds, requiring only $O(m^2n)$ secure multiplications. To obtain the common denominator of the pseudoinverse, required for constructing an integer-representation of the pseudoinverse, we generalize a result by Ben-Israel for computing the squared volume of a matrix. Also, we show how to precondition a symmetric matrix to achieve generic rank profile while preserving symmetry and being able to remove the preconditioner after it has served its purpose. These results may be of independent interest.

James Bell, Aurélien Bellet, Adria Gascon and Tejas Kulkarni.
Private Protocols for $U$-statistics in the Local Model and Beyond

In this paper, we study the problem of privately computing quantities that come in the form of \emph{averages over pairs} of data points $U_{f,n}=\frac{2}{n(n-1)}\sum_{i,j}f(x_i,x_j)$ for some pairwise function $f$. $U_{f,n}$ is known as the (degree 2) $U$-statistic with kernel $f$ and is an unbiased estimate of $U_f=\mathbb E_{x,x'\sim \mu}[f(x,x')]$ with minimum variance. The class of $U$-statistics covers many statistical estimates of interest, including Gini mean difference, Kendall's tau coefficient, Wilcoxon Mann-Whitney hypothesis test and Area under the ROC Curve (AUC). They are also commonly used as empirical risk measures for machine learning problems such as ranking, clustering and metric learning. In this context, our contribution is the design and analysis of several protocols with privacy and utility guarantees. (1) We introduce a generic, natural LDP protocol based on quantizing the data into $k$ bins and applying $k$-ary randomized response. We show that under an assumption on either the kernel function $f$ or the data distribution $\mu$, the aggregator can construct an estimate of $U_{f,n}$ with a Mean Squared Error (MSE) of $O(1/\sqrt{n}\epsilon)$. (2) For the case of the AUC, whose kernel does not satisfy the regularity assumption required by our previous protocol, we design a specialized protocol based on hierarchical histograms that achieves MSE $O(1/n\epsilon^2)$ for arbitrary data distribution. (3) Under a slight relaxation of the local model in which we allow pairs of users $i$ and $j$ to compute a randomized version of $f(x_i,x_j)$ with 2-party secure computation, we show that we can design a protocol with MSE of $O(1/n\epsilon^2)$, without any assumption on the kernel function or data distribution and with communication linear in the number of users $n$. (4) Finally, we provide some experiments on synthetic and real datasets to evaluate the practical performance of the proposed protocols.

Mark Abspoel, Niek J. Bouman, Berry Schoenmakers and Niels de Vreede.
Fast Secure Comparison for Medium-Sized Integers and Its Application in Binarized Neural Networks

In 1994, Feige, Kilian, and Naor proposed a simple protocol for secure 3-way comparison of integers a and b from the range [0, 2]. Their observation is that for p = 7, the Legendre symbol (x | p) coincides with the sign of x for x = a − b ∈ [−2, 2], thus reducing secure comparison to secure evaluation of the Legendre symbol. More recently, in 2011, Yu generalized this idea to handle secure comparisons for integers from substantially larger ranges [0,d], essentially by searching for primes for which the Legendre symbol coincides with the sign function on [−d,d]. In this work, we present new comparison protocols based on the Legendre symbol that additionally employ some form of error correction. We relax the prime search by requiring that the Legendre symbol encodes the sign function in a noisy fashion only. Practically, we use the majority vote over a window of 2k + 1 adjacent Legendre symbols, for small positive integers k. Our technique significantly increases the comparison range: e.g., for a modulus of 60 bits, d increases by a factor of 2.9 (for k = 1) and 5.4 (for k = 2) respectively. We give a practical method to find primes with suitable noisy encodings. We demonstrate the practical relevance of our comparison protocol by applying it in a secure neural network classifier for the MNIST dataset. Concretely, we discuss a secure multiparty computation based on the binarized multi-layer perceptron of Hubara et al., using our comparison for the second and third layers.

Devin Reich, Ariel Todoki, Rafael Dowsley, Martine De Cock and Anderson Nascimento.
Secret Sharing based Private Text Classification

The ability to elicit information through automated classification of personal texts has significant economic and societal value, as manifested by many useful applications in surveillance, e-commerce, and mental health care, to name a few. Regardless of the clear benefits, giving applications access to personal texts can easily lead to (un)intentional privacy violations. We propose the first privacy-preserving solution for text classification that is provably secure. Our method, which is based on Secure Multiparty Computation (SMC), encompasses both feature extraction from texts, and subsequent classification with logistic regression and tree ensembles. When using our secure text classification method, the application does not learn anything about the text, and the author of the text does not learn anything about the text classification model used by the application. We perform end-to-end experiments with an application for detecting hate speech against women and immigrants, demonstrating excellent runtime results without loss of accuracy.

Qingrong Chen, Chong Xiang, Minhui Xue, Bo Li, Nikita Borisov, Dali Kaafar and Haojin Zhu.
Differentially Private Data Sharing: Sharing Models versus Sharing Data

In this paper, we study two different approaches to enable data sharing for learning tasks while preserving data privacy. The first approach is to share representation learning models with multiple parties, for which we choose to use a differentially private autoencoder-based generative model (DP-AuGM). The second approach is to share generated data with multiple parties through generative models, for which we choose to use a differentially private variational autoencoder-based generative model (DP-VaeGM). To achieve differential privacy, we train both models by adding differential privacy noise to the gradient. We evaluate the performance of our two proposed approaches across various differential privacy budgets. We also present the robustness of our two proposed approaches against model inversion attacks, membership inference attacks, and generative adversarial network (GAN) based attacks against collaborative deep learning only in the extended version of this paper available at https://arxiv.org/pdf/1812.02274.pdf.

Antti Koskela, Joonas Jälkö and Antti Honkela.
Computing Exact Guarantees for Differential Privacy

Quantification of the privacy loss associated with a randomised algorithm has become an active area of research and (ε,δ)-differential privacy has arisen as the standard measure of it. We propose a numerical method for evaluating the parameters of differential privacy for algorithms with continuous one dimensional output. In this way the parameters ε and δ can be evaluated, for example, for the subsampled multidimensional Gaussian mechanism which is also the underlying mechanism of differentially private stochastic gradient descent. The proposed method is based on a numerical approximation of an integral formula which gives the exact (ε,δ)-values. The approximation is carried out by discretising the integral and by evaluating discrete convolutions using a fast Fourier transform algorithm. We give theoretical error bounds which show the convergence of the approximation and guarantee its accuracy to an arbitrary degree. Experimental comparisons with state-of-the-art techniques illustrate the efficacy of the method.

Prasad Naldurg and Karthikeyan Bhargavan.
A verification framework for secure machine learning

We propose a programming and verification framework to help developers build distributed software applications using composite homomorphic encryption (and secure multi-party computation protocols), to implement secure machine learning and classification over private data. With our framework, a developer can prove that the application code is functionally correct, that it correctly composes the various cryptographic schemes it uses, and that it does not accidentally leak any secrets (via side-channels, for example.) Our end-to-end solution results in verified and efficient implementations of state-of-the-art secure privacy-preserving learning and classification techniques.

Thijs Veugen, Bart Kamphorst, Marie Beth van Egmond and Natasja van de L'Isle.
Privacy-Preserving Coupling of Vertically-Partitioned Databases and Subsequent Training with Gradient Descent

In this extended abstract for PPML'19 we show how multiple data-owning parties can collaboratively train several machine learning algorithms without jeopardizing the privacy of their sensitive data. In particular, we assume that every party knows specific features of an overlapping set of people. Using a secure implementation of an advanced hidden set intersection protocol and a privacy-preserving Gradient Descent algorithm, we are able to train a Ridge, LASSO or SVM model over the intersection of people in their data sets. Both the hidden set intersection protocol and privacy-preserving LASSO implementation are unprecedented in literature.

Sebastian P. Bayerl, Ferdinand Brasser, Christoph Busch, Tommaso Frassetto, Patrick Jauernig, Jascha Kolberg, Andreas Nautsch, Korbinian Riedhammer, Ahmad-Reza Sadeghi, Thomas Schneider, Emmanuel Stapf, Amos Treiber and Christian Weinert.
Privacy-Preserving Speech Processing via STPC and TEEs

With the advent of mobile and smart-home devices such as Amazon Alexa or the Google Assistant providing voice-based interfaces, voice data is commonly transferred to corresponding cloud services. This is necessary to quickly and accurately perform tasks like automatic speaker verification (ASV) and speech recognition (ASR) that heavily rely on machine learning. While enabling intriguing new applications, this development also poses significant risks: Voice data is highly sensitive since it contains biometric information of the speaker as well as the spoken words. Thus, the security and privacy of billions of end-users is at stake if voice data is not protected properly. When developing privacy-preserving solutions to mitigate such risks, it is also important to keep in mind that the involved machine learning models represent intellectual property of the service providers and therefore must not be revealed to users. The contribution of our work is three-fold: First, we present an efficient architecture for privacy-preserving ASV via outsourced secure two-party computation (STPC). Compared to existing solutions based on homomorphic encryption (HE), the verification process is 4000x faster, while retaining a high verification accuracy and guaranteeing unlinkability, irreversibility, and renewability of stored biometric data. Since cryptographic secure computation protocols currently do not scale to more involved tasks like ASR, we then present VoiceGuard, an architecture that efficiently protects speech processing inside a trusted execution environment (TEE). We provide a proof-of-concept implementation and evaluate it on speech recognition tasks isolated with Intel SGX, a widely available TEE implementation, demonstrating even real time processing capabilities. Finally, we present Offline Model Guard (OMG) to enable privacy-preserving speech processing on the predominant mobile computing platform ARM even in offline scenarios. Beyond relying on the Intel SGX equivalent ARM TrustZone, we employ the security architecture SANCTUARY (NDSS'19) for strict hardware-enforced isolation from all other system components. Our prototype implementation performs privacy-preserving keyword recognition using TensorFlow Lite in real time.

Ranya Aloufi, Hamed Haddadi and David Boyle.
Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants

Voice-enabled interactions provide more human-like experiences in many popular IoT systems. Cloud-based speech analysis services extract useful information from voice input using speech recognition techniques. The voice signal is a rich resource that discloses several possible states of a speaker, such as emotional state, confidence and stress levels, physical condition, age, gender, and personal traits. Service providers can build a very accurate profile of a user's demographic category, personal preferences, and may compromise privacy. To address this problem, a privacy-preserving intermediate layer between users and cloud services is proposed to sanitize the voice input. It aims to maintain utility while preserving user privacy. It achieves this by collecting real time speech data and analyzes the signal to ensure privacy protection prior to sharing of this data with services providers. Precisely, the sensitive representations are extracted from the raw signal by using transformation functions and then wrapped it via voice conversion technology. Experimental evaluation based on emotion recognition to assess the efficacy of the proposed method shows that identification of sensitive emotional state of the speaker is reduced by 96 %.

Lukas Burkhalter, Alexander Viand, Anwar Hithnawi and Hossein Shafagh.
Robust Secure Aggregation for Privacy-Preserving Federated Learning with Adversaries

In this paper, we explore how to bootstrap safe and private federated learning with the goal of safeguarding the learning process in the presence of malicious inputs. We present a novel secure aggregation protocol for federated learning that, in addition to protecting the privacy of individual users' updates, also provides privacy-preserving accountability to limit malicious clients' impact on the final model. We provide a preliminary experimental evaluation of our protocol and discuss its performance and effectiveness on confining malicious inputs in the federated learning setting.

Anders Dalskov, Daniel Escudero and Marcel Keller.
Secure Evaluation of Quantized Neural Networks

Machine Learning models, and specially convolutional neural networks (CNNs), are at the heart of many day-to-day applications like image classification and speech recognition. The need for evaluating such models whilst preserving the privacy of the input provided increases as the models are used for more information-sensitive tasks like DNA analysis or facial recognition. Research on evaluating CNNs securely has been very active during the last couple of years, e.g.~Mohassel \& Zhang (S\&P'17) and Liu et al.~(CCS'17), leading to very efficient frameworks like SecureNN (ePrint:2018:442), which can perform evaluation of some CNNs with a multplicative overhead of only $17$--$33$ with respect to evaluation in the clear. We contribute to this line of research by introducing a technique from the Machine Learning domain, namely \emph{quantization}, which allows us to scale secure evaluation of CNNs to much larger networks without the accuracy loss that could happen by adapting the network to the MPC setting. Quantization is motivated by the deployment of ML models in resource-constrained devices, and we show it to be useful in the MPC setting as well. Our results show that it is possible to evaluate \emph{realistic} models---specifically Google's MobileNets line of models for image recognition---within seconds. Our performance gain can be mainly attributed to two key ingredients: One is the use of the three-party MPC protocol based on replicated secret sharing by Araki et al. (S\&P'17), whose multiplication protocol only requires sending one number per party. Moreover, it allows to evaluate arbitrary long dot products at the same communication cost of a single multiplication, which facilitates matrix multiplications considerably. The second main ingredient is the use of arithmetic modulo $2^{64}$, for which we develop a set of primitives of indepedent interest that are necessary for the quantization like comparison and truncation by a secret shift.

Mohammad Yaghini, Bogdan Kulynych and Carmela Troncoso.
Disparate Vulnerability: on the Unfairness of Privacy Attacks Against Machine Learning

Machine learning models have been shown to be vulnerable to membership inference attacks (MIA). In these attacks, an adversary uses the outputs of a model to infer whether a given example was part of the training dataset. Such knowledge threatens individuals' privacy when the classifier is associated to sensitive domains such as medical research, law enforcement, or financial services. However, the attack effectiveness has so far only been evaluated by computing the accuracy of the attack over the whole population. In this paper we investigate whether such average-oriented evaluation actually reflects the impact of these attacks across subgroups of the population. We introduce a new notion: disparate vulnerability, which, similarly to disparate impact and disparate mistreatment in fairness, captures the differential impact that privacy attacks impose on subgroups. We propose a framework to quantify disparate vulnerability in an efficient manner. This framework relies on a new definition, MIA-indistinguishability, that captures the resistance of a model to MIA. Based on this definition, we provide a method to compute the privacy loss for a given subgroup, in the presence of MIA adversaries. Our theoretical analysis reveals that disparate vulnerability has two main roots. First, unbalanced training data across subgroups. Second, disparate overfitting across subgroups, i.e., the model under attack exhibits different behavior in training and testin for different subgroups. We empirically evaluate these findings on several classifiers trained to predict the age of individuals in the ADULT dataset. We find that, regardless of the architecture of a classifier, underrepresented minorities are more vulnerable to MIA than others, whereas the majority is the least vulnerable to MIA. Finally, we show that a common technique to mitigate privacy leakage---differentially-private training does not effectively address disparate vulnerability.

Privacy Preserving Machine Learning

Scope

Call For Papers & Important Dates

Submission Instructions

Poster Instructions

Travel Grants

Invited Speakers

Schedule

Accepted Papers

Organization

Workshop organizers

Program Committee

Sponsors

Previous Editions