Write a Blog >>
ECOOP 2021
Sun 11 - Sat 17 July 2021 Online
co-located with ECOOP and ISSTA 2021
Fri 16 Jul 2021 08:00 - 08:20 at ECOOP 1 - Empirical Studies / Parallelism (time band 3) Chair(s): Hakjoo Oh
Fri 16 Jul 2021 19:00 - 19:20 at ECOOP 1 - Potpourri (time band 1) Chair(s): Omer Tripp

Analyzing massive code bases has become a staple of modern software engineering research. This has happened as a welcome side-effect of the advent of public large-scale software repositories such as GitHub. Yet, finding which projects to analyze is a labor-intensive process that can lead to biased analysis results if the selection is not representative. The search interfaces exposed by mainstream software repositories do not allow researchers to formulate anything but very basic queries. This paper reports on Code DJ , an infrastructure designed to assist researchers in querying such repositories and identifying projects of interest. The infrastructure is composed of two subsystems: a persistent datastore that is constantly updated with information acquired from its target large-scale software repository (in our case GitHub), and an in-memory database with a query interface written in Rust and designed to follow popular data science API principles. Our infrastructure has built-in support for reproducibility. Users can formulate historical queries that are answered deterministically using historical states of the datastore; thus researchers can always reproduce published results. To illustrate the benefits of the proposed system, we revisit a paper aiming to establish a correlation between programming languages and software defect. Using Code DJ , we identify biases in the dataset used in the original paper. By repeating the analysis performed by the original authors with new data, we demonstrate that the results of the paper are highly sensitive to the choice of projects.

Fri 16 Jul

Displayed time zone: Brussels, Copenhagen, Madrid, Paris change

08:00 - 09:00
Empirical Studies / Parallelism (time band 3)ECOOP Technical Papers at ECOOP 1
Chair(s): Hakjoo Oh Korea University
08:00
20m
Talk
CodeDJ: Reproducible Queries over Large-Scale Software Repositories
ECOOP Technical Papers
Petr Maj Czech Technical University, Konrad Siek Czech Technical University in Prague, Jan Vitek Northeastern University / Czech Technical University, Alexander Kovalenko Czech Technical University in Prague
DOI
08:20
20m
Talk
Enabling Additional Parallelism in Asynchronous JavaScript Applications
ECOOP Technical Papers
Ellen Arteca Northeastern University, Frank Tip Northeastern University, Max Schaefer GitHub, Inc.
DOI
08:40
20m
Talk
Do Bugs Propagate? An Empirical Analysis of Temporal Correlations among Software Bugs
ECOOP Technical Papers
Xiaodong Gu Shanghai Jiao Tong University, China, Sunghun Kim Hong Kong University of Science and Technology, Yo-Sub Han Yonsei University, Hongyu Zhang University of Newcastle
DOI
19:00 - 20:00
Potpourri (time band 1)ECOOP Technical Papers at ECOOP 1
Chair(s): Omer Tripp Amazon
19:00
20m
Talk
CodeDJ: Reproducible Queries over Large-Scale Software Repositories
ECOOP Technical Papers
Petr Maj Czech Technical University, Konrad Siek Czech Technical University in Prague, Jan Vitek Northeastern University / Czech Technical University, Alexander Kovalenko Czech Technical University in Prague
DOI
19:20
20m
Talk
Differential Privacy for Coverage Analysis of Software Traces
ECOOP Technical Papers
Yu Hao Ohio State University, Sufian Latif Ohio State University, Hailong Zhang Fordham University, Raef Bassily Ohio State University, Atanas Rountev Ohio State University
DOI
19:40
20m
Talk
Dealing with Variability in API Misuse Specification
ECOOP Technical Papers
Rodrigo Bonifácio Computer Science Department - University of Brasília, Stefan Krüger Independent Researcher, Krishna Narasimhan TU Darmstadt, Eric Bodden University of Paderborn; Fraunhofer IEM, Mira Mezini TU Darmstadt, Germany
DOI