Enhanced Westermo dataset - Transformed and Modified for Test case Selection and Priorotization in the context of Continuous Integration and Reinforcement Learning.

  • Aicha Moussaid (Skapad av)
  • Ricardo Chavez Tapia (Skapad av)
  • Saad Waseem (Skapad av)
  • S M Zahid Hasan (Skapad av)
  • Sepinoud Azimi (Skapad av)
  • Sebastien Lafond (Skapad av)
  • Ivan Porres Paltor (Skapad av)

Dataset

Beskrivning

Overview This repository contains a modified version of the existing, recently published dataset, Westermo. The initial dataset was gathered at Westermo Network Technologies AB, located in Västerås, Sweden. It encompasses over 1 Million verdicts obtained from testing embedded systems, collected over a span of more than 500 consecutive days of nightly testing. The dataset has been transformed and tailored specifically to cater to the research community, particularly for addressing challenges such as regression test selection, identification of flaky tests, and visualization of test results. The original dataset can be accessed through the reference provided in [1]. The Westermo dataset offers valuable historical information regarding the execution of test cases and their corresponding results. It serves as a valuable resource for evaluating and comparing different Test case Selection and Prioritization (TSP) techniques, enabling researchers to identify test cases that are more likely to fail during subsequent executions. Test cases in the dataset are characterized by attributes such as execution duration, previous last execution time, and the results of their recent executions. This dataset offers valuable historical information regarding the execution of test cases and their corresponding results. It serves as a valuable resource for evaluating and comparing different test case prioritization and selection techniques, enabling researchers to identify test cases that are more likely to fail during subsequent executions. Test cases in the dataset are characterized by attributes such as execution duration, previous last execution time, and the results of their recent executions. Table 1: Dataset Overview Test Cases 1855 CI Cycles 15,197 Verdict 1,036,818 Failed 5.03% However, the diversity and multitude of the features in the dataset can be irrelevant to some TSP approaches. This led us to perform a dataset conversion, where we customized Westermo to have the same features from Paint Control and IOF/ROL, two widely used datasets in Reinforcement Learning based TSP approaches. This conversion required the combination of multiple variables and generating the target ones. When it comes to generating the “LastResults” and “Cycle” values, further analysis was required and the data handling needed an in-depth understanding of how the nightly testing was conducted. This led us to investigate what a CI cycle is in their context, and we followed their definition of a session, stating that “a session is when we run a suite of tests on one test system with a certain software version and testware version”. When splitting the data according to the 9 different systems used, we were able to generate 9 different sub-sets that fit the CI context. File Format The compressed .zip file contains 9 files, each one corresponding to each of the 9 systems. The datasets are available in CSV format, with the semicolon (;) serving as the delimiter. The columns included are represented in the table below along with their descriptions. Table 2: Parameters of the dataset Column Name Content jid job id, together with the system name, the pair (jid, system) forms a unique key for a test session System Name of the test system Name Unique numeric identifier of the test case Verdict Test verdict of this test execution (Failed: 1, Passed: 0) Duration Approximated runtime of the test case Cycle The number of the CI cycle this test execution belongs to. Group The group test case belongs to. LastRun Previous last execution of the test case as date-time-string (Format: YYYY-MM-DD HH:ii ) Id Unique numeric identifier of the test execution CalcPrio Priority of the test case, calculated by the prioritization algorithm (output column, initially 0) result_array List of previous test results (Failed: 1, Passed: 0), ordered by ascending age. Lists are delimited by [ ]. The implications of this conversion are important as it can help the previous works to re-assess their approaches and have more data for training and testing, as well as opening a broader data spectrum for future researchers in this field to find ready-to-use, rich datasets, on which they could evaluate their approaches and contribute to the TSP community. This also addresses the limitations in the field discussed in the systematic literature review [2], stating that future research on TSP techniques should focus on collecting data from more recent subjects in a CI context with varying failure rates and larger execution times, as reproducible studies with appropriate datasets are needed to develop a usable body of knowledge regarding TSP over time. We believe that this conversion of the Westermo dataset is our contribution to alleviating the gap for the RL-based approaches. The original dataset can be found here.
Datum då datat gjorts tillgängligt23 maj 2023
FörlagZenodo

Citera det här