SARCoS research objectives

Objective 1 – Build a DSML-Framework for SARCoS Test Models

A key objective during the first six months of the project is to define a set of domain-specific modeling languages (DSMLs) to capture the important semantic information that is embedded in the communication sequences from the SUT Observations. The goal is to be able to define a DSML for each SUT from our industry partners, which captures the important semantic information of
communications for that SUT. The design of each DSML will be done in conjunction with test engineers from that industry partner, to capture their knowledge about the semantics of SUT interactions, and their testing goals.

Although we will be creating at least four different DSMLs (one for each industry partner, or more if they have multiple testing goals), there will be some common features such as communication packets (records with typed fields, and ranges of permissible values for each field), and sequencing of packets, including timing constraints on the sequences. To ease tool development, we will use a DSL framework (for example, XText6) to design these DSMLs and to develop our DSML tools for parsing, editing, printing and manipulating communication sequences.

Once a DSML has been defined, this will enable us to take a large set of SUT observations, parse each observation as a sequence of communication packets with timing information, strip out the important semantic information from that sequence, and then record that abstract communication sequence (ACS) as an instance of the DSML grammar. Conversely, we will also build tools for
taking an instance of the DSML grammar and mapping it back into a concrete test sequence, filling in the missing details via lookup tables, default values, and probabilistic value generation. So each DSML defines a state space of abstract communication sequences, targeted at testing a particular aspect of a particular SUT.

Performance Indicators (due month 6): 4 DSMLs designed (for the 4 targeted domains); at least 3
sets of SUT observations collected; SARCoS DSML general framework specified.

Objective 2 – Infer DSML Test Models

This Phase 2 objective (months 6-18) will apply existing model inference and language semantics research to DSML test model inference from an input set of abstract communication sequences (ACSs) for a given SUT. Starting with simple test model skeleton generation, the model inference technique will improve iteratively during the project so that data-driven test model inference is successfully applied to each application domain.

Performance Indicators (due month 18): Report on model inference techniques and tools chosen for
learning DSMLs; two DSML test model inference tools implemented and applied to at 2 sets of SUT

Objective 3 – Security testing at the business-logic level for simulating false data injection

Security tests typically require test selection criteria to be supplied by security experts who are knowledgeable in the SUT application domain. This objective will provide ways for those security experts to express their SUT knowledge by writing false data attack scenarios at the business-logic level, which can then be used as test selection criteria to drive model-based testing of the desired security tests. This objective will be completed during Phase 2 (months 6-18) of the project, and alpha-tested during Phase 3 (months 18-28) with the industry partners. Further refinement of the security scenarios and the MBT security test generation is expected during Phase 3, with final industry testing occurring in Phase 4 (months 28-36).

Performance Indicators (due month 28): User manual documenting dynamic test selection criteria for security
tests; MBT tool for security tests; report on fault-detection effectiveness of generated security tests.

Objective 4 – Robustness testing using behavioral fuzzing, boundary testing, and exception

There are many kinds of robustness testing that can be used to stress-test the robustness and reliability of connected SUTs. They can range from simple 'monkey testing' (random input sequences) to sophisticated generation of long test sequences with subtle input errors. Our DSML test models learned in Phase 2, which can capture semantic knowledge about the SUT's normal communication behavior, give an excellent basis for generation of these kinds of robustness tests.

We will evaluate the effectiveness of generating robustness tests using a variety of strategies,

  • exception tests, where just one value is outside normal bounds;
  • boundary tests, which test the allowable limit values of each field in a communication packet;
  • behavioral fuzzing, where all values are valid, but the test sequence contains inconsistent values;

During Phase 3 (months 18-28), these techniques and others will be used to generate a wide variety of robustness tests, and applied to the industry SUTs to measure their fault-detection effectiveness.

Performance Indicators (due month 28): MBT tool for generating robustness tests (implementing the three
strategies above); report on fault-detection effectiveness of these three strategies.

Objective 5 – Select and prioritize test scenarios by using online learning models.

Modern agile development practices involve continuous build, test and deploy actions of the software-system. They carry on opportunities for tuning the testing process towards the most relevant part of the SUT. By leveraging online learning models based on reinforcement learning, our approach has the potential to perform a breakthrough in software testing. Online learning is a conceptual process where results from previous test runs can be used to guide the selection and prioritization of test cases for the current run. Typically, a test scenario which has exhibited a failure in a given run, should be considered with higher priority in subsequent runs. However, designing a reward function and an appropriate memory model that can capture the essence of the most relevant part of the test results is difficult and requires consistent exploratory experiments. Hopefully, the availability of advanced machine-learning platforms such as Weka7 in Java or Scikit-learn8 in Python makes these experiments much easier for the non-expert analyst. In Phase 4 (months 28-36) we will evaluate the practicality and utility of online learning models for selecting and prioritizing test scenarios in the continuous integration processes of the industrial use cases of the project.

Performance Indicators (due month 36): Report on evaluating the effectiveness of online learning models in 4
industrial case studies

Objective 6 – Smart analytics of test execution results

Using automated MBT, it is easy to generate and execute tons of tests on the SUT, but the sheer volume of tests can create problems in prioritizing and understanding the failures that come from those test executions. Anomaly detection appears as an important topic in this context and providing smart analytics of test execution results is crucial to help the validation engineers to focus on the more error-prone parts of the SUT. This Phase 3 (months 18-28) objective will develop unsupervised machine learning models based on clustering (hierarchical or flat clustering depending on the nature of the distance function being selected) to group test failures, to sort them depending on their priority level. Proposing intelligent visualization of test results by selecting anomalies and abstracting elements coming from the test artifacts (ACS traces, DSML models) will improve the overall testing process by helping test engineers to focus on the most error-prone parts of the SUT. These testfailure intelligent visualization will be evaluated with our industry partners and further refined during Phase 4.

Performance Indicators (due month 28): Test-failure intelligent visualization implementing anomaly
clustering; (Due month 36): Report evaluating usability of test-failure intelligent visualization.