The availability of massive amounts of data, coupled with high-performance cloud computing
platforms, has driven significant progress in artificial intelligence and, in particular,
machine learning and optimization. It has profoundly impacted several areas, including computer
vision, natural language processing, and transportation. However, the use of rich data sets
also raises significant privacy concerns: They often reveal personal sensitive information
that can be exploited, without the knowledge and/or consent of the involved individuals, for
various purposes including monitoring, discrimination, and illegal activities.
The second AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-21) held at the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) builds on the success of last year’s AAAI PPAI to provide a platform for researchers, AI practitioners, and policymakers to discuss technical and societal issues and present solutions related to privacy in AI applications. The workshop will focus on both the theoretical and practical challenges related to the design of privacy-preserving AI systems and algorithms and will have strong multidisciplinary components, including soliciting contributions about policy, legal issues, and societal impact of privacy in AI.
- Algorithmic approaches to protect data privacy in the context of learning, optimization, and decision making that raise fundamental challenges for existing technologies.
- Privacy challenges created by the governments and tech industry response to the Covid-19 outbreak.
- Social issues related to tracking, tracing, and surveillance programs.
- Algorithms and frameworks to release privacy-preserving benchmarks and data sets.
TopicsThe workshop organizers invite paper submissions on the following (and related) topics:
- Applications of privacy-preserving AI systems
- Attacks on data privacy
- Differential privacy: theory and applications
- Distributed privacy-preserving algorithms
- Human rights and privacy
- Privacy issues related to the Covid-19 outbreak
- Privacy policies and legal issues
- Privacy preserving optimization and machine learning
- Privacy preserving test cases and benchmarks
- Surveillance and societal issues
Finally, the workshop will welcome papers that describe the release of privacy-preserving benchmarks and data sets that can be used by the community to solve fundamental problems of interest, including in machine learning and optimization for health systems and urban networks, to mention but a few examples.
The workshop will be a one-day and a half meeting. The first session (half day) will be dedicated to privacy challenges, particularly those risen by the Covid-19 pandemic tracing and tracking policy programs. The second, day-long, session will be dedicated to the workshop technical content about privacy-preserving AI. The workshop will include a number of (possibly parallel) technical sessions, a virtual poster session where presenters can discuss their work, with the aim of further fostering collaborations, multiple invited speakers covering crucial challenges for the field of privacy-preserving AI applications, including policy and societal impacts, a number of tutorial talks, and will conclude with a panel discussion.
- November 16, 2020 – Submission Deadline [Extended]
- December 7, 2020 – AAAI Fast Track Submission Deadline [New]
- January 7, 2021 – Acceptance Notification [Updated]
- February 8 and 9, 2020 – Workshop Date
Submission URL: https://cmt3.research.microsoft.com/PPAI2021
- Technical Papers: Full-length research papers of up to 7 pages (excluding references and appendices) detailing high quality work in progress or work that could potentially be published at a major conference.
- Short Papers: Position or short papers of up to 4 pages (excluding references and appendices) that describe initial work or the release of privacy-preserving benchmarks and datasets on the topics of interest.
- Technical Track: This track is dedicated to the privacy-preserving AI technical content. It welcomes research contributions centered around the topics described above.
- Privacy Challenges and Social Issues Track: This track is dedicated to discussion of privacy challenges, particularly those risen by the Covid-19 pandemic tracing and tracking policy programs. It welcomes both technical contributions and position papers.
[New] AAAI Fast Track (Rejected AAAI papers)
Rejected AAAI papers with *average* scores of at least 4.5 may be asubmitted directly to PPAI along with previous reviews. These submissions may go through a light review process or accepted if the provided reviews are judged to meet the workshop standard.
All papers must be submitted in PDF format, using the AAAI-21 author kit.
Submissions should include the name(s), affiliations, and email addresses of all authors.
Submissions will be refereed on the basis of technical quality, novelty, significance, and clarity. Each submission will be thoroughly reviewed by at least two program committee members.
Submissions of papers rejected from the AAAI 2021 technical program are welcomed.
For questions about the submission process, contact the workshop chairs.
Invited talks, Tutorials, and Panel discussion: Will be live streamed (recording available). Spotlights and Poster Talks: Are pre-recorded and accessible at any time (click on the play button next to the associated paper). There will be additional Q&A and discussion at the poster sessions.
Poster sessions: are hosted on Discord. See instructions below.
PPAI Day 1 - February 8, 2021
|Time||Talk / Presenter|
|09:00||Invited Talk by John M. Abowd|
|Session chair: Xi He|
|09:45||Spotlight Talk: On the Privacy-Utility Tradeoff in Peer-Review Data Analysis|
|10:00||Spotlight Talk: Leveraging Public Data in Practical Private Query Release: A Case Study with ACS Data|
|10:30||Invited Talk by Aswin Machanavajjhala|
|11:20||Tutorial: A tutorial on privacy amplification by subsampling, diffusion and shuffling, by Audra McMillan|
|Session chair: Marco Romanelli|
|13:30||Spotlight Talk: Efficient CNN Building Blocks for Encrypted Data|
|13:45||Spotlight Talk: Differentially Private and Fair Deep Learning: A Lagrangian Dual Approach|
|14:00||Spotlight Talk: A variational approach to privacy and fairness|
|14:15||Invited Talk by Steven Wu|
|15:00||Poster Session 1||join (on Discord)|
|17:00||End of Workshop (day 1)|
PPAI Day 2 - February 9, 2021
|Time||Talk / Presenter|
|09:00||Invited Talk by Reza Shokri|
|Session chair: TBA|
|09:45||Spotlight Talk: Coded Machine Unlearning|
|10:00||Spotlight Talk: DART: Data Addition and Removal Trees|
|10:30||Invited Talk by Nicolas Papernot|
|11:20||Tutorial: Privacy and Federated Learning: Principles, Techniques and Emerging Frontiers by Brendan McMahan, Kallista Bonawitz, and Peter Kairouz|
|Session chair: Mark Bun|
|13:30||Spotlight Talk: Reducing ReLU Count for Privacy-Preserving CNNs|
|13:45||Spotlight Talk: Output Perturbation for General Differentially Private Convex |
Optimization with Improved Population Loss Bounds, Runtimes and Applications to Private Adversarial Training
|14:00||Spotlight Talk: An In-depth Review of Privacy Concerns Raised by the COVID-19 Pandemic|
|14:15||Panel: “Differential Privacy: Implementation, deployment, and receptivity. Where are we and what are we missing?”|
|15:00||Poster Session 2||join (on Discord)|
|17:00||End of Workshop|
Poster Presentation instructions on DiscordEach contributed paper is associated with two channels:
- Text channel: Here is where you will find a short video presentation
(2 min) and a poster slide associated with each paper.
You can: 1. Visualize the material, and 2. Leave questions for the authors to respond (outside the poster session time frame).
- Video channel: It is used for Q&A and discussion during the poster session time. When joining a voice channel, remember turn on your mic (and, optionally, your camera).
- On the Privacy-Utility Tradeoff in Peer-Review Data Analysis
Wenxin Ding (Carnegie Mellon University); Nihar Shah (CMU); Weina Wang (CMU)
- Leveraging Public Data in Practical Private Query Release: A Case Study with ACS Data
Terrance Liu (Carnegie Mellon University); Giuseppe Vietri (University of Minnesota); Thomas Steinke (Google); Jonathan Ullman (Northeastern University); Steven Wu (Carnegie Mellon University)
- Efficient CNN Building Blocks for Encrypted Data
Nayna Jain (IIIT Bangalore); Karthik Nandakumar (Mohamed Bin Zayed University of Artificial Intelligence, UAE); Nalini Ratha (SUNY Buffalo); Sharath Pankanti (Microsoft); Uttam Kumar (IIIT Bangalore)
- Differentially Private and Fair Deep Learning: A Lagrangian Dual Approach
Cuong Tran (Syracuse University)
- A variational approach to privacy and fairness
Borja Rodríguez Gálvez (KTH Royal Institute of Technology); Ragnar Thobaben (KTH Royal Institute of Technology); Mikael Skoglund (KTH Royal Institute of Technology)
- Coded Machine Unlearning
Nasser Aldaghri (University of Michigan); Hessam Mahdavifar (University of Michigan); Ahmad Beirami (Facebook, USA)
- DART: Data Addition and Removal Trees
Jonathan Brophy (University of Oregon); Daniel Lowd (University of Oregon)
- Reducing ReLU Count for Privacy-Preserving CNNs
Inbar Helbitz (Tel Aviv University); Shai Avidan (Tel Aviv University)
- Output Perturbation for General Differentially Private Convex Optimization with Improved Population Loss Bounds, Runtimes and Applications to Private Adversarial Training
Andrew Lowy (USC); Meisam Razaviyayn (USC)
- An In-depth Review of Privacy Concerns Raised by the COVID-19 Pandemic
Jiaqi Wang (Penn State University)
- Differentially Private Random Forests for Regression and Classification
Shorya Consul (University of Texas at Austin); Sinead Williamson (UT Austin/CognitiveScale)
- An Analysis Of Protected Health Information Leakage In Deep-Learning Based De-Identification Algorithms
Salman Seyedi (Emory University)
- Dopamine: Differentially Private Secure Federated Learning on Medical Data
Mohammad Malekzadeh (Imperial College London); Burak Hasircioglu ( Imperial College London); Nitish Mital (Imperial College London ); Kunal Katarya (Imperial College London ); Mehmet Emre Ozfatura (Imperial College London); Deniz Gunduz (Imperial College London)
- Differential Privacy Meets Maximum-weight Matching
[paper not available]
Panayiotis Danassis (École Polytechnique Fédérale de Lausanne); Aleksei Triastcyn (EPFL); Boi Faltings (EPFL)
- Intelligent Frame Selection as a Privacy-Friendlier Alternative to Face Recognition
Mattijs Baert (Ghent University - IMEC); Sam Leroux (Ghent University - IMEC); Pieter Simoens (Ghent University - imec)
- Accuracy and Privacy Evaluations of Collaborative Data Analysis
Akira Imakura (University of Tsukuba); Anna Bogdanova (University of Tsukuba); Takaya Yamazoe (University of Tsukuba); Kazumasa Omote (University of Tsukuba); Tetsuya Sakurai (University of Tsukuba)
- Maintaining the Utility of Privacy-Aware Schedules
[paper not available]
Arik Senderovich (University of Toronto); Ali Kaan Tutak (Humboldt University of Berlin); Christopher Beck (University of Toronto); Stephan Fahrenkrog-Petersen (Humboldt University of Berlin); Matthias Weidlich (Humboldt-Universität zu Berlin)
- A Study of F0 Modification for X-Vector Based Speech Pseudo-Anonymization Across Gender
Champion Pierre (INRIA); Denis Jouvet (INRIA); Anthony Larcher (Universitad du Mans - LIUM)
- Private Emotion Recognition with Secure Multiparty Computation
Kyle J Bittner (University of Washington Tacoma); Rafael Dowsley (Monash University); Martine De Cock (University of Washington Tacoma)
- Optimized Data Sharing with Differential Privacy: A Game-theoretic Approach
Nan Wu (Macquarie University and CSIRO's Data61); Farhad Farokhi (The University of Melbourne); David Smith (DATA61, CSIRO); Mohamed Ali Kaafar (Macquarie University and CSIRO-Data61)
- Personalized privacy protection in social networks through adversarial modeling
Sachin G Biradar (Amazon.Inc); Elena Zheleva (University of Illinois at Chicago)
- Hybrid Privacy Scheme
Yavor Litchev (Lexington High School); Abigail Thomas (Nashua High School South)
- Compressive Differentially-Private Federated Learning Through Universal Vector Quantization
Saba Amiri (University of Amsterdam); Adam Belloum (Multiscale Networked Systems (MNS) Research Group, University of Amsterdam, 1098 XH Amsterdam, The Netherlands); Leon Gommans (Air France KLM); Sander Klous (Vrije Universiteit Amsterdam)
- S++: A Fast and Deployable Secure-Computation Framework for Privacy-Preserving Neural Network Training
Prashanthi Ramachandran (Ashoka University); Shivam Agarwal (Ashoka University); Aastha Shah (Ashoka University); Arup Mondal (Ashoka University); Debayan Gupta (Ashoka University)
- Differentially Private Multi-Agent Constraint Optimization
[paper not available]
Sankarshan Damle (Machine Learning Lab, International Institute of Information Technology, Hyderabad); Aleksei Triastcyn (EPFL); Boi Faltings (EPFL); Sujit P. Gujar (Machine Learning Laboratory, International Institute of Information Technology, Hyderabad)
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. Similarly, federated analytics (FA) allows data scientists to generate analytical insight from the combined information in distributed datasets without requiring data centralization. Federated approaches embody the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in federated learning and analytics, this tutorial will provide a gentle introduction to the area. The focus will be on cross-device federated learning, including deep dives on differential privacy and secure computation in the federated setting; federated analytics and cross-silo federated learning will also be discussed.
Practical differential privacy deployments require tight privacy accounting. A toolbox of “privacy amplification” techniques has been developed to simplify the privacy analysis of complicated differentially private mechanisms. These techniques can be used to design new differentially private mechanisms, as well as provide tighter privacy guarantees for existing mechanisms. In this tutorial, we will discuss three main privacy amplification techniques; subsampling, diffusion and shuffling. We will discuss the intuition for why each technique amplifies privacy, and where it is useful in practice. Finally, we will use differentially private stochastic gradient descent as an example of how each technique can be used to easily provide a tight, or almost tight, privacy analysis.
The talk will focus on the implementation of differential privacy used to protect the data products in the 2020 Census of Population and Housing. I will present a high-level overview of the design used for the majority of the data products, known as the TopDown Algorithm. I will focus on the high-level policy and technical challenges that the U.S. Census Bureau faced during the implementation including the original science embodied in that algorithm, implementation challenges arising from the production constraints, formalizing policies about privacy-loss budgets, communicating the effects of the algorithms on the final data products, and balancing competing data users' interests against the inherent privacy loss associated with detailed data publications.
Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information. To address this problem, algorithms for private machine learning have been proposed. In this talk, we first show that training neural networks with privacy requires rethinking their architectures with the goals of privacy-preserving gradient descent in mind. Second, we explore how private aggregation surfaces the synergies between privacy and generalization in machine learning. Third, we present recent work towards a form of collaborative machine learning that is both privacy-preserving in the sense of differential privacy, and confidentiality-preserving in the sense of the cryptographic community.
Several organizations, especially federal statistical agencies, routinely release fine grained statistical data products for social good that are critical for enabling resource allocation, policy and decision making as well as research. Differential privacy, the gold standard privacy technology, has long been motivated by this use case. In this talk, I will describe our recent experiences deploying differential privacy at scale at US federal statistical agencies. I will highlight how the process of deploying DP at these agencies differs from the idealized problem studied in the research literature, and illustrate a few key technical challenges we encountered in these deployments.
Machine learning models leak information about their training data. Randomizing gradients during training is a technique to preserve differential privacy, and protect against inference attacks. The general method to compute the differential privacy bound is to use composition theorems: to view the training process as a sequence of differentially-private algorithms, and to compute the composition of their DP bounds. This results in a loose bound on the privacy loss of the released model, as it accounts for the privacy loss of all training epochs (even if the intermediate parameters are not released). I will present a novel approach for analyzing the dynamics of privacy loss, throughout the training process, assuming that the internal state of the algorithm (its parameters during training) remains private. This enables computing how privacy loss changes after each training epoch, and the privacy loss at the time of releasing the model. I show that differential privacy bound converges, and it converges to a tight bound.
This talk will focus on differentially private synthetic data---a privatized version of the dataset that consists of fake data records and that approximates the real dataset on important statistical properties of interest. I will present our recent results on private synthetic data that leverage practical optimization heuristics to circumvent the computational bottleneck in existing work. Our techniques are motivated by a modular, game-theoretic framework, which can flexibly work with methods such as integer program solvers and deep generative models.
Differential Privacy: Implementation, deployment, and receptivity. Where are we and what are we missing?Panelists:
John M. AbowdU.S. Census Bureau
Ashwin MachanavajjhalaDuke University
Nicolas PapernotUniversity of Toronto
Reza ShokriNational University of Singapore
Steven WuCarnegie Mellon University
- Aws Albarghouthi - University of Wisconsin-Madison
- Carsten Baum - Aarhus University
- Aurélien Bellet - INRIA
- Mark Bun - Boston University
- Albert Cheu - Northeastern University
- Graham Cormode - University of Warwick
- Rachel Cummings - Georgia Tech
- Xi He - University of Waterloo
- Antti Honkela - University of Helsinki
- Mohamed Ali Kaafar - Macquarie University and CSIRO-Data61
- Kim Laine - Microsoft Research
- Yuliia Lut - Georgia Institute of Technology
- Terrence W.K. Mak - Georgia Institute of Technology
- Olga Ohrimenko - The University of Melbourne
- Catuscia Palamidessi - Laboratoire d'informatique de l'École polytechnique
- Paritosh Ramanan - Georgia Institute of Technology
- Marco Romanelli - INRIA
- Reza Shokri - NUS
- Sahib Singh - Ford and OpenMined
- Vikrant Singhal - Northeastern University
- Keyu Zhu - Georgia Institute of Technology