The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-21)
Fully Virtual Workshop - February 8 and 9, 2021
The availability of massive amounts of data, coupled with high-performance cloud computing
platforms, has driven significant progress in artificial intelligence and, in particular,
machine learning and optimization. It has profoundly impacted several areas, including computer
vision, natural language processing, and transportation. However, the use of rich data sets
also raises significant privacy concerns: They often reveal personal sensitive information
that can be exploited, without the knowledge and/or consent of the involved individuals, for
various purposes including monitoring, discrimination, and illegal activities.
The second AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-21) held at the
Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)
builds on the success of last year’s AAAI PPAI
to provide a platform for researchers, AI practitioners, and policymakers to discuss technical
and societal issues and present solutions related to privacy in AI applications.
The workshop will focus on both the theoretical and practical challenges related to the design
of privacy-preserving AI systems and algorithms and will have strong multidisciplinary
components, including soliciting contributions about policy, legal issues, and societal
impact of privacy in AI.
Finally, the workshop will welcome papers that describe the release of privacy-preserving benchmarks and data sets that can be used by the community to solve fundamental problems of interest, including in machine learning and optimization for health systems and urban networks, to mention but a few examples.
The workshop will be a one-day and a half meeting. The first session (half day) will be dedicated to privacy challenges, particularly those risen by the Covid-19 pandemic tracing and tracking policy programs. The second, day-long, session will be dedicated to the workshop technical content about privacy-preserving AI. The workshop will include a number of (possibly parallel) technical sessions, a virtual poster session where presenters can discuss their work, with the aim of further fostering collaborations, multiple invited speakers covering crucial challenges for the field of privacy-preserving AI applications, including policy and societal impacts, a number of tutorial talks, and will conclude with a panel discussion.
Submission URL: https://cmt3.research.microsoft.com/PPAI2021
Rejected AAAI papers with *average* scores of at least 4.5 may be asubmitted directly to PPAI along with previous reviews. These submissions may go through a light review process or accepted if the provided reviews are judged to meet the workshop standard.
All papers must be submitted in PDF format, using the AAAI-21 author kit.
Submissions should include the name(s), affiliations, and email addresses of all authors.
Submissions will be refereed on the basis of technical quality, novelty, significance, and
clarity. Each submission will be thoroughly reviewed by at least two program committee members.
Submissions of papers rejected from the AAAI 2021 technical program are welcomed.
For questions about the submission process, contact the workshop chairs.
Time | Talk / Presenter | |
---|---|---|
08:50 | Introductory remarks | |
09:00 | Invited Talk by John M. Abowd | |
Session chair: Xi He | ||
09:45 | Spotlight Talk: On the Privacy-Utility Tradeoff in Peer-Review Data Analysis | |
10:00 | Spotlight Talk: Leveraging Public Data in Practical Private Query Release: A Case Study with ACS Data | |
10:30 | Invited Talk by Aswin Machanavajjhala | |
11:15 | Break | |
11:20 | Tutorial: A tutorial on privacy amplification by subsampling, diffusion and shuffling, by Audra McMillan | |
12:50 | Break | |
Session chair: Marco Romanelli | ||
13:30 | Spotlight Talk: Efficient CNN Building Blocks for Encrypted Data | |
13:45 | Spotlight Talk: Differentially Private and Fair Deep Learning: A Lagrangian Dual Approach | |
14:00 | Spotlight Talk: A variational approach to privacy and fairness | |
14:15 | Invited Talk by Steven Wu | |
15:00 | Poster Session 1 | join (on Discord) |
17:00 | End of Workshop (day 1) | |
PPAI Day 2 - February 9, 2021 | ||
Time | Talk / Presenter | |
09:00 | Invited Talk by Reza Shokri | |
Session chair: TBA | ||
09:45 | Spotlight Talk: Coded Machine Unlearning | |
10:00 | Spotlight Talk: DART: Data Addition and Removal Trees | |
10:30 | Invited Talk by Nicolas Papernot | |
11:15 | Break | |
11:20 | Tutorial: Privacy and Federated Learning: Principles, Techniques and Emerging Frontiers by Brendan McMahan, Kallista Bonawitz, and Peter Kairouz | |
12:50 | Break | |
Session chair: Mark Bun | ||
13:30 | Spotlight Talk: Reducing ReLU Count for Privacy-Preserving CNNs | |
13:45 | Spotlight Talk: Output Perturbation for General Differentially Private Convex Optimization with Improved Population Loss Bounds, Runtimes and Applications to Private Adversarial Training | |
14:00 | Spotlight Talk: An In-depth Review of Privacy Concerns Raised by the COVID-19 Pandemic | |
14:15 | Panel: “Differential Privacy: Implementation, deployment, and receptivity. Where are we and what are we missing?” | |
15:00 | Poster Session 2 | join (on Discord) |
17:00 | End of Workshop |
Abstract:
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. Similarly, federated analytics (FA) allows data scientists to generate analytical insight from the combined information in distributed datasets without requiring data centralization. Federated approaches embody the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in federated learning and analytics, this tutorial will provide a gentle introduction to the area. The focus will be on cross-device federated learning, including deep dives on differential privacy and secure computation in the federated setting; federated analytics and cross-silo federated learning will also be discussed.
Abstract:
Practical differential privacy deployments require tight privacy accounting. A toolbox of “privacy amplification” techniques has been developed to simplify the privacy analysis of complicated differentially private mechanisms. These techniques can be used to design new differentially private mechanisms, as well as provide tighter privacy guarantees for existing mechanisms. In this tutorial, we will discuss three main privacy amplification techniques; subsampling, diffusion and shuffling. We will discuss the intuition for why each technique amplifies privacy, and where it is useful in practice. Finally, we will use differentially private stochastic gradient descent as an example of how each technique can be used to easily provide a tight, or almost tight, privacy analysis.
Abstract:
The talk will focus on the implementation of differential privacy used to protect the data products in the 2020 Census of Population and Housing. I will present a high-level overview of the design used for the majority of the data products, known as the TopDown Algorithm. I will focus on the high-level policy and technical challenges that the U.S. Census Bureau faced during the implementation including the original science embodied in that algorithm, implementation challenges arising from the production constraints, formalizing policies about privacy-loss budgets, communicating the effects of the algorithms on the final data products, and balancing competing data users' interests against the inherent privacy loss associated with detailed data publications.
Abstract:
Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information. To address this problem, algorithms for private machine learning have been proposed. In this talk, we first show that training neural networks with privacy requires rethinking their architectures with the goals of privacy-preserving gradient descent in mind. Second, we explore how private aggregation surfaces the synergies between privacy and generalization in machine learning. Third, we present recent work towards a form of collaborative machine learning that is both privacy-preserving in the sense of differential privacy, and confidentiality-preserving in the sense of the cryptographic community.
Abstract:
Several organizations, especially federal statistical agencies, routinely release fine grained statistical data products for social good that are critical for enabling resource allocation, policy and decision making as well as research. Differential privacy, the gold standard privacy technology, has long been motivated by this use case. In this talk, I will describe our recent experiences deploying differential privacy at scale at US federal statistical agencies. I will highlight how the process of deploying DP at these agencies differs from the idealized problem studied in the research literature, and illustrate a few key technical challenges we encountered in these deployments.
Abstract:
Machine learning models leak information about their training data. Randomizing gradients during training is a technique to preserve differential privacy, and protect against inference attacks. The general method to compute the differential privacy bound is to use composition theorems: to view the training process as a sequence of differentially-private algorithms, and to compute the composition of their DP bounds. This results in a loose bound on the privacy loss of the released model, as it accounts for the privacy loss of all training epochs (even if the intermediate parameters are not released). I will present a novel approach for analyzing the dynamics of privacy loss, throughout the training process, assuming that the internal state of the algorithm (its parameters during training) remains private. This enables computing how privacy loss changes after each training epoch, and the privacy loss at the time of releasing the model. I show that differential privacy bound converges, and it converges to a tight bound.
Abstract:
This talk will focus on differentially private synthetic data---a privatized version of the dataset that consists of fake data records and that approximates the real dataset on important statistical properties of interest. I will present our recent results on private synthetic data that leverage practical optimization heuristics to circumvent the computational bottleneck in existing work. Our techniques are motivated by a modular, game-theoretic framework, which can flexibly work with methods such as integer program solvers and deep generative models.