Differential Privacy for Coverage Analysis of Software Traces
Sat 17 Jul 2021 01:10 - 01:30 at ECOOP 1 - Potpourri (time band 2) Chair(s): Lingming Zhang
This work considers software execution traces, where a trace is a sequence of run-time events. Each user of a software system collects the set of traces covered by her execution of the software, and reports this set to an analysis server. Our goal is to report the local data of each user in a \emph{privacy-preserving manner} by employing local differential privacy, a powerful theoretical framework for designing privacy-preserving data analysis.
A significant advantage of such analysis is that it offers principled “built-in” privacy with clearly-defined and quantifiable privacy protections. In local differential privacy, the data of an individual user is modified using a \emph{local randomizer} before being sent to the untrusted analysis server. Based on the randomized information from all users, the analysis server computes, for each trace, an estimate of how many users have covered it.
Such analysis requires that the domain of possible traces be defined ahead of time. Unlike in prior related work, here the domain is either infinite or, at best, restricted to many billions of elements. Further, the traces in this domain typically have structure defined by the static properties of the software. To capture these novel aspects, we define the trace domain with the help of context-free grammars. We illustrate this approach with two exemplars: a \emph{call chain analysis} in which traces are described through a regular language, and an \emph{enter/exit trace analysis} in which traces are described by a balanced-parentheses context-free language. Randomization over such domains is challenging due to their large size, which makes it impossible to use prior randomization techniques. To solve this problem, we propose to use \emph{count sketch}, a fixed-size hashing data structure for summarizing frequent items. We develop a version of count sketch for trace analysis and demonstrate its suitability for software execution data. In addition, instead of randomizing separately each contribution to the sketch, we develop a much-faster one-shot randomization of the accumulated sketch data.
One important client of the collected information is the identification of high-frequency (“hot”) traces. We develop a novel approach to identify hot traces from the collected randomized sketches. A key insight is that the very large domain of possible traces can be efficiently explored for hot traces by using the frequency estimates of a visited trace and its prefixes and suffixes. Our experimental study of both call chain analysis and enter/exit trace analysis indicates that the frequency estimates, as well as the identification of hot traces, achieve high accuracy and high privacy.
Fri 16 JulDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
19:00 - 20:00 | |||
19:00 20mTalk | CodeDJ: Reproducible Queries over Large-Scale Software Repositories ECOOP Technical Papers Petr Maj Czech Technical University, Konrad Siek Czech Technical University in Prague, Jan Vitek Northeastern University / Czech Technical University, Alexander Kovalenko Czech Technical University in Prague DOI | ||
19:20 20mTalk | Differential Privacy for Coverage Analysis of Software Traces ECOOP Technical Papers Yu Hao Ohio State University, Sufian Latif Ohio State University, Hailong Zhang Fordham University, Raef Bassily Ohio State University, Atanas Rountev Ohio State University DOI | ||
19:40 20mTalk | Dealing with Variability in API Misuse Specification ECOOP Technical Papers Rodrigo Bonifácio Computer Science Department - University of Brasília, Stefan Krüger Independent Researcher, Krishna Narasimhan TU Darmstadt, Eric Bodden University of Paderborn; Fraunhofer IEM, Mira Mezini TU Darmstadt, Germany DOI |
Sat 17 JulDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
01:10 - 02:30 | Potpourri (time band 2)ECOOP Technical Papers at ECOOP 1 Chair(s): Lingming Zhang University of Illinois at Urbana-Champaign | ||
01:10 20mTalk | Differential Privacy for Coverage Analysis of Software Traces ECOOP Technical Papers Yu Hao Ohio State University, Sufian Latif Ohio State University, Hailong Zhang Fordham University, Raef Bassily Ohio State University, Atanas Rountev Ohio State University DOI | ||
01:30 20mTalk | Do Bugs Propagate? An Empirical Analysis of Temporal Correlations among Software Bugs ECOOP Technical Papers Xiaodong Gu Shanghai Jiao Tong University, China, Sunghun Kim Hong Kong University of Science and Technology, Yo-Sub Han Yonsei University, Hongyu Zhang University of Newcastle DOI | ||
01:50 20mTalk | Linear Promises: Towards Safer Concurrent Programming ECOOP Technical Papers Ohad Rau Georgia Institute of Technology, Caleb Voss Georgia Institute of Technology, Vivek Sarkar Georgia Institute of Technology DOI | ||
02:10 20mTalk | Dealing with Variability in API Misuse Specification ECOOP Technical Papers Rodrigo Bonifácio Computer Science Department - University of Brasília, Stefan Krüger Independent Researcher, Krishna Narasimhan TU Darmstadt, Eric Bodden University of Paderborn; Fraunhofer IEM, Mira Mezini TU Darmstadt, Germany DOI |