NeurIPS 2018 Workshop
Montréal, December 8
Palais des Congrès de Montréal
Room: 512CDGH
This one day workshop focuses on privacy preserving techniques for training, inference, and disclosure in large scale data analysis, both in the distributed and centralized settings. We have observed increasing interest of the ML community in leveraging cryptographic techniques such as Multi-Party Computation (MPC) and Homomorphic Encryption (HE) for privacy preserving training and inference, as well as Differential Privacy (DP) for disclosure. Simultaneously, the systems security and cryptography community has proposed various secure frameworks for ML. We encourage both theory and application-oriented submissions exploring a range of approaches, including:
We think it will be very valuable to have a forum to unify different perspectives and start a discussion about the relative merits of each approach. The workshop will also serve as a venue for networking people from different communities interested in this problem, and hopefully foster fruitful long-term collaboration.
Submission deadline: October 8 October 16, 2018 (11:59pm AoE)
Notification of acceptance: November 1, 2018
Workshop: December 8, 2018
Submissions in the form of extended abstracts must be at most 4 pages long (not including references and an unlimited number of pages for supplemental material, which reviewers are not required to take into account) and adhere to the NeurIPS format. We do accept submissions of work recently published or currently under review. Submissions should be anonymized. The workshop will not have formal proceedings, but authors of accepted abstracts can choose to have a link to arxiv or a pdf published on the workshop webpage.
If the new notification date causes issues with a potential visa application that depends specifically on the acceptance at this workshop, please contact us directly at ppml18@easychair.org.
We can offer the opportunity to purchase a NeurIPS registration to one author of each accepted paper.
From the Workshop FAQ: the reserve tickets guarantee attendance to the workshops, and depending on availability, also to the main conference and tutorials. We expect most of the reserve tickets to allow registration for tutorials, conference and workshops, but again, only the workshops part is for certain.
8:30 | Welcome and Introduction |
8:50 | Invited talk: Ian Goodfellow — Scalable PATE and the Secret Sharer [slides] [keynote] |
Coming soon
|
|
9:40 | Invited talk: Shafi Goldwasser — Machine Learning and Cryptography: Challenges and Opportunities |
At the mid eighties researchers in computational learning theory pointed the way to examples of hard learning tasks such as learning parity with noise (LPN) and learning with errors (LWE) which have been extremely useful for building sophisticated cryptographic primitives such as homomorphic encryption which are unbreakable if LPN and LWE are hard to learn.
Today, with the rise of machine learning algorithms that use large amounts of data to come up with procedures which have the potential to replace human decision processes, cryptography, in turn, stands to provide machine learning, tools for keeping data private during both training and inference phases of ML and to provide methods to verify adherence of models with data. These promise to be important steps in ensuring the safe transition of power from human to algorithmic decision making. |
|
10:30 | Coffee break |
11:00 |
Privacy Amplification by Iteration
(contributed talk)
Vitaly Feldman, Ilya Mironov, Kunal Talwar and Abhradeep Thakurta |
Many commonly used learning algorithms work by iteratively updating an intermediate solution using one or a few data points in each iteration. Analysis of differential privacy for such algorithms often involves ensuring privacy of each step and then reasoning about the cumulative privacy cost of the algorithm. This is enabled by composition theorems for differential privacy that allow releasing of all the intermediate results. In this work, we demonstrate that for contractive iterations, not releasing the intermediate results strongly amplifies the privacy guarantees.
We describe several applications of this new analysis technique to solving convex optimization problems via noisy stochastic gradient descent. For example, we demonstrate that a relatively small number of non-private data points from the same distribution can be used to close the gap between private and non-private convex optimization. In addition, we demonstrate that we can achieve guarantees similar to those obtainable using the privacy-amplification-by-sampling technique in several natural settings where that technique cannot be applied. |
|
11:15 |
Subsampled Renyi Differential Privacy and Analytical Moments Accountant
(contributed talk)
[slides]
Yu-Xiang Wang, Borja Balle and Shiva Kasiviswanathan |
We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms. Specifically, we provide a tight upper bound on the Renyi Differential Privacy (RDP) parameters for algorithms that: (1) subsample the dataset, and then (2) applies a randomized mechanism M to the subsample, in terms of the RDP parameters of M and the subsampling probability parameter. Our results generalize the moments accounting technique, developed by Abadi et al. [CCS'16] for the Gaussian mechanism, to any subsampled RDP mechanism.
|
|
11:30 |
The Power of The Hybrid Model for Mean Estimation
(contributed talk)
Yatharth Dubey and Aleksandra Korolova |
In this work we explore the power of the hybrid model of differential privacy (DP) proposed in~\cite{blender}, where some users desire the guarantees of the local model of DP and others are content with receiving the trusted curator model guarantees. In particular, we study the accuracy of mean estimation algorithms for arbitrary distributions in bounded support. We show that a hybrid mechanism which combines the sample mean estimates obtained from the two groups in an optimally weighted convex combination performs a constant factor better for a wide range of sample sizes than natural benchmarks. We analyze how this improvement factor is parameterized by the problem setting and how it varies with sample size.
|
|
11:45 |
Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity
(contributed talk)
[slides]
Ulfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar and Abhradeep Thakurta |
Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the collection of such statistics in the local differential privacy (LDP) model, when each user's value may change only a small number of times. We describe an algorithm whose privacy cost is polylogarithmic in the number of times the statistic is collected.
More fundamentally---by building on anonymity of the users' reports---we also demonstrate how the privacy cost of our LDP algorithm can actually be much lower when viewed in the central model of differential privacy. We show, via a new and general technique, that any permutation-invariant algorithm satisfying $\varepsilon$-local differential privacy will satisfy $(O(\varepsilon \sqrt{\log \frac 1 \delta} / \sqrt{n}), \delta)$-central differential privacy. In the process, we clarify how the high noise and $\sqrt{n}$ overhead of LDP protocols is a consequence of them being significantly more private in the central model. As a final, practical corollary, our results also imply that industrial deployments of LDP mechanism may have much lower privacy cost than their advertised $\varepsilon$ would indicate---at least if reports are anonymized. |
|
12:00 | Lunch break |
13:30 | Invited talk: Kamalika Chaudhuri — Challenges in the Privacy-Preserving Analysis of Structured Data [slides] |
Much data analysis today is done on sensitive data, and particular privacy challenges arise when this data is sensitive and structured. In this talk I will describe two such challenges in the privacy-preserving analysis of complex, structured data that we have been working on in my group.
The first is continual release of graph statistics in an online manner from an expanding graph, which is motivated by a problem in HIV epidemiology. Even though node differentially private release of graph statistics is highly challenging, here we will describe how we can get a differentially private solution for this problem that performs better than the natural sequential composition baseline. Next, I will talk about analysis of sensitive structured, correlated data, while still preserving the privacy of events in the data. Differential privacy does not adequately address privacy issues in this kind of data, and hence will look at a form of inferential privacy, called Pufferfish, that is more appropriate. We will provide mechanisms, establish their composition properties, and finally evaluate them on real data on physical activity measurements across time. |
|
14:20 | Invited talk: Adam Smith — Models for private data analysis of distributed data [slides] |
This talk will present a partial survey of the proposed (and implemented) models for private analysis of distributed data, together with some new results.
|
|
15:10 | Coffee break |
15:30 |
DP-MAC: The Differentially Private Method of Auxiliary Coordinates for Deep Learning
(contributed talk)
Frederik Harder, Jonas Köhler, Max Welling and Mijung Park |
Developing a differentially private deep learning algorithm is challenging, due to the difficulty in analyzing the sensitivity of objective functions that are typically used to train deep neural networks. Many existing methods resort to the stochastic gradient descent algorithm and apply a pre-defined sensitivity to the gradients for privatizing weights. However, their slow convergence typically yields a high cumulative privacy loss. Here, we take a different route by employing the method of auxiliary coordinates, which allows us to independently update the weights per layer by optimizing a per-layer objective function. This objective function can be well approximated by a low-order Taylor's expansion, in which sensitivity analysis becomes tractable. We perturb the coefficients of the expansion for privacy, which we optimize using more advanced optimization routines than SGD for faster convergence. We empirically show that our algorithm provides a decent trained model quality under a modest privacy budget.
|
|
15:45 |
Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware
(contributed talk)
Florian Tramer and Dan Boneh |
As Machine Learning (ML) gets applied to security-critical or sensitive domains, there is a growing need for integrity and privacy for outsourced ML computations. A pragmatic solution comes from Trusted Execution Environments (TEEs), which use hardware and software protections to isolate sensitive computations from the untrusted software stack. However, these isolation guarantees come at a price in performance, compared to untrusted alternatives. This paper initiates the study of high performance execution of Deep Neural Networks (DNNs) in TEEs by efficiently partitioning DNN computations between trusted and untrusted devices. Building upon an efficient outsourcing scheme for matrix multiplication, we propose Slalom, a framework that securely delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX or Sanctum) to a faster, yet untrusted, co-located processor. We evaluate Slalom by executing DNNs in an Intel SGX enclave, which selectively delegates work to an untrusted GPU. For two canonical DNNs, VGG16 and MobileNet, we obtain 20× and 6× increases in throughput for verifiable inference, and 11× and 4× for verifiable and private inference.
|
|
16:00 |
Secure Two Party Distribution Testing
(contributed talk)
[slides]
Alexandr Andoni, Tal Malkin and Negev Shekel Nosatzki |
We study the problem of discrete distribution testing in the two-party setting. For example, in the standard closeness testing problem, Alice and Bob each have t samples from, respectively, distributions a and b over [n], and they need to test whether a=b or a,b are ε-far (in the ℓ1 distance) for some fixed ε>0. This is in contrast to the well-studied one-party case, where the tester has unrestricted access to samples of both distributions, for which optimal bounds are known for a number of variations. Despite being a natural constraint in applications, the two-party setting has evaded attention so far.
We address two fundamental aspects of the two-party setting: 1) what is the communication complexity, and 2) can it be accomplished securely, without Alice and Bob learning extra information about each other's input. Besides closeness testing, we also study the independence testing problem, where Alice and Bob have t samples from distributions a and b respectively, which may be correlated; the question is whether a,b are independent of ε-far from being independent. Our contribution is three-fold: ∙ Communication: we show how to gain communication efficiency as we have more samples, beyond the information-theoretic bound on t. Furthermore, the gain is polynomially better than what one may obtain by adapting one-party algorithms. For the closeness testing, our protocol has communication s=Θ_ε(n^2/t^2) as long as t is at least the information-theoretic minimum number of samples. For the independence testing over domain [n]×[m], where n≥m, we obtain s=O_ε(n^2m/t^2+nm/t+m^1/2). ∙ Security: we define the concept of secure distribution testing and argue that it must leak at least some minimal information when the promise is not satisfied. We then provide secure versions of the above protocols with an overhead that is only polynomial in the security parameter. ∙ Lower bounds: we prove tightness of our trade-off for the closeness testing, as well as that the independence testing requires tight Ω(m^1/2) communication for unbounded number of samples. These lower bounds are of independent interest as, to the best of our knowledge, these are the first 2-party communication lower bounds for testing problems, where the inputs represent a set of i.i.d. samples. |
|
16:15 |
Private Machine Learning in TensorFlow using Secure Computation
(contributed talk)
[slides]
Morten Dahl, Jason Mancuso, Yann Dupis, Ben DeCoste, Morgan Giraud, Ian Livingstone, Justin Patriquin and Gavin Uhma |
We present a framework for experimenting with secure multi-party computation directly in TensorFlow. By doing so we benefit from several properties valuable to both researchers and practitioners, including tight integration with ordinary machine learning processes, existing optimizations for distributed computation in TensorFlow, high-level abstractions for expressing complex algorithms and protocols, and an expanded set of familiar tooling. We give an open source implementation of a state-of-the-art protocol and report on concrete benchmarks using typical models from private machine learning.
|
|
16:30 | Spotlight talks |
|
|
17:15 | Poster session |
18:15 | Wrap up |
Thanks to our generous sponsors, we are able to provide a limited number of travel grants of up to $800 to help partially cover the expenses of authors of accepted papers who have not received other travel support from NeurIPS this year. To apply, please send an email to ppml18@easychair.org with the subject “PPML18 Travel Grant Application” including your resume and a half-page statement of purpose mentioning the title and the authors of your accepted paper and a summary of anticipated travel expenses. If you are an undergraduate or graduate student, we ask for a half-page recommendation letter supporting your application to be sent to us by the deadline. The deadline for applications is November 11, 2018 (11:59pm AoE). The notifications will be sent by November 16. Please feel free to send us an email if you have any questions.