CI-EN: RUI: Collaborative Research: TraceLab Community Infrastructure

for Replication, Collaboration, and Innovation


Period of Performance: 

06/01/2015 - 05/31/2018


The challenge of experimental reproducibility extends across almost every science and engineering discipline. Recently, a widely reported study conducted by the biotech firm "Amgen" revealed that of 53 previously published landmark papers fundamental to their development plan, only six were reproducible. While this does not mean that the other studies were fraudulent, it does mean that there was insufficient information to make them reproducible by others. Recent studies have unearthed similar problems across a diverse set of Software Engineering domains, including, but not limited to, software traceability, feature location, and compiler optimization. Reproducibility is often undermined by lack of publicly available datasets, obsolete and unavailable tools, insufficient details about the experiment, and undocumented decisions about the way various metrics are computed.


Efforts to establish public benchmarks and datasets in repositories such as PROMISE, Eclipse Bug Data, and SIR (for testing), partially address the problem by supporting the evaluation of different techniques against the same data sets. Frameworks such as Weka and RapidMiner provide common algorithmic solutions for research in fields such as information retrieval, machine learning, and other similar areas of research, but are not primarily intended for software engineers. In contrast, TraceLab (funded under an NSF Major Research Instrumentation grant) provides libraries of components targeted at Software Engineering researchers and is ideally suited for supporting reproducibility and sharing of experiments. In this proposal we outline a plan for using TraceLab to reproduce a significant selection of landmark experiments in traceability, feature location, impact analysis, program summarization, testing, and requirements engineering. Our work will (1) provide a baseline against which future work can be evaluated, (2) create publicly available libraries of TraceLab components implementing important algorithms in each domain, and (3) serve as an exemplar for future reproducibility of results.



G1. Augment the TraceLab Framework. In addition to the more innovative aspects of our
proposal, we need to provide ongoing support for TraceLab by adding and maintaining features as requested by our user base.

G2. Reproduce landmark experiments. A core focus of our proposed work will involve identifying
and reproducing landmark experiments across multiple domains. We define a landmark
experiment as “an experiment which represents an important advancement in the field.” The
reproduced experiments will provide a baseline against which to compare new techniques and an
extensive set of reusable components which new researchers can use to equip their research environments, and to support innovative experimentation. The identified domains include traceability (all), feature location (Poshyvanyk), impact analysis (Poshyvanyk), program summarization (McMillan), testing (Hayes), and requirements engineering (Cleland-Huang). For each of these domains we will
deliver downloadable experiments, reusable TraceLab components, and challenges using TraceLab’s challenge feature. This thrust of our research represents a significant emphasis of this proposal.

G3. Build general purpose libraries. Empirical Software Engineering experiments from multiple
domains share numerous underlying algorithms and data structures. We will therefore identify
cross-cutting concerns from the various research domains and augment TraceLab’s existing set of general purpose libraries. Examples of existing libraries include natural language processing, source code analysis, metrics computations, and statistical analysis.

G4. Build and maintain a self-sustaining open source community. TraceLab is already
publicly released and used by research groups in over 12 universities. However, it is vitally important that TraceLab is transitioned into a sustainable open source community. Given the tremendous support for TraceLab from both industry and academia (please see support letters) and the active group of TraceLab users, we plan to continue our outreach efforts by demonstrating the use of TraceLab for reproducibility purposes at workshops, conferences, and through journals. We also will provide ongoing support to new adopters through training materials and mentoring.

G5. Develop pedagogical materials. The TraceLab environment provides an ideal context
for students to learn to design and execute experiments in traceability, testing, feature location,
requirements analysis, and other software engineering areas. We plan to augment TraceLab components and the reproduced experiments with training materials describing underlying research questions, research methods, and algorithms. These materials will be disseminated via and other appropriate venues.

G6. Outreach to industry. In order to impact practice, we will make specific efforts to reach
out to industrial partners and to engage in technology transfer initiatives.


Exemplary Products:

1.      Jane Huffman Hayes, Wenbin Li, Tingting Yu, Xue Han, Mark Hays and Clinton Woodson, “Measuring Requirement Quality to Predict Testability,” in Workshop on Artificial Intelligence in Requirements Engineering (AIRE), at IEEE International Requirements Engineering (RE) Conference, Ottawa, Canada, 2015.

2.      Tingting Yu, Wei Wen, Xue Han and Jane Huffman Hayes, “Predicting Testability of Concurrent Programs,” in Proceedings of Int’l Conference on Software Testing and Verification and Validation (ICST) 2016

3.      Universal importer - TraceLab component - Purpose: Importer component for numerous data types. Available upon request (to hayes at

4.      Text artifact classifier - TraceLab component - Purpose: Build models to classify text artifacts (such as bug reports, requirements, research papers). Available upon request (to hayes at