Comparing Strategies for Improving Precision When

Checking Safe Evolution of Software Product Lines

 

Jefferson Almeida – jra@cin.ufpe.br

 

Python Script

 

Given a range of commits where we truly have a product line, this script randomly selects a revision within this interval. After that, the script ensures that the selected revision has not been nominated before and creates a temporary branch to execute the experiment, which we didactically call, the source product line. On the second step, the script selects another revision three commits above of the first one and creates the second branch, which we didactically call the target product line. To do that, we use the SVN log file with all commits during TaRGeT development.

____________________________________________________________________________________________________

 

We repeated five times the execution of each approach for each evolution step. In our results section, we only show the performance data of the Chart05, because the performance results are very similar in all executions. However, below we provide all charts containing performance outcomes.

 

Chart01

Chart02

Chart03

Chart04

Chart05

____________________________________________________________________________________________________

 

EvoSuite includes the total length of a test suite as a secondary optimization goal. As the stopping conditions are based on coverage achievement, EvoSuite therefore minimizes test suites as a post-processing step. This feature enhances test readability and performance, because the test suite is smaller and accordingly the techniques spend less time to compile and run tests.

 

In our experimental sample, Randoop generated test suite size 99.93% bigger than EvoSuite.

 

 

Testes per second in average

Randoop

37

EvoSuite

0.02

 

To confirm these results we conducted a Wilcoxon statistical test. Below, we present a table with the outcomes.

There is enough evidence that Randoop has a test suite size larger than the EvoSuite.

 

Alternative Hypothesis

Wilcoxon rank sum test with continuity correction

IC - Randoop  is greater than  IC - EvoSuite

W = 1067.5, p-value = 4.529e-08

IC - Randoop  is greater than EIC - EvoSuite

W = 1056.5, p-value = 9.193e-08

EIC - Randoop ] is greater than IC - EvoSuite

W = 1112, p-value = 2.211e-09

EIC - Randoop ] is greater than  EIC - EvoSuite

W = 1111, p-value = 2.375e-09

                                                             

                                                           95% of Confidence Level

____________________________________________________________________________________________________

Now, we present and detail some TaRGeT evolution steps not described in our result section.

Evolution Step

 

In this evolution, the developer modifies a method responsible for comparing the similarity between the test cases generated by TaRGeT product line. As you can see in the figure below, the developer removes a code branch changing its behavior. To be able to perceive the behavior change, our approaches need to generate a test to exercise this removed branch. When running the same test in the method after evolution, which no longer has this branch certainly change behavior will be exposed. For this evolution, only IC-EvoSuite was able to identify the behavior difference. IC-Randoop generated tests that covered a significant amount of branches, but it was incapable to notice the behavioral incompatibility, because it could not create an oracle test to attest the method behavior. Due to the fact that  Randoop does not directly aim at code coverage and do not have a heuristic to guide its search for tests. The tests generated by IC-Randoop and IC-EvoSuite are described below, respectively. EIC was incapable to perceive the behavior change with EvoSuite nor with Randoop, because the extended impacted classes has many forwards dependencies and the time available for both was insufficient to exercise all these dependencies in order to realize the behavioral divergence between source and target methods.

Tests

                                                                                                       Randoop test

 

                                                                                                          EvoSuite test

____________________________________________________________________________________________________

Evolution Step

 

In this scenario, we have two evolutions in methods that receive a string as a parameter and returns another string. These methods have several conditional structures that are equal true depending on the string format passed as parameter.  Therefore, to cover all branches and perceive behavior incompatibilities, our approaches need to generate specific string  format to be used as input to the method. This issue leads to the problem of String input generation as we explain in our dissertation.  As Evosuite is driven by branch coverage, it switches the generated string used for parameters several times until cover the required branch or exceed the available time for test generation. That is the reason why IC- Evosuite could identify the behavioral change. Below, we provide the tests produced by Evosuite, which exposed the behavior divergence. In contrast, we cannot say the same for Randoop. Its best-generated strings were not enough to perceive the changes. Nevertheless, notice that we are not saying that Evosuite is better than Randoop regarding String input generation. Both do not treat the problem well, however in this subject Evosuite done better than Randoop. On the other hand, EIC could not perceive the behavior difference between source and target methods. This evolution requires generating test inputs that satisfy the properties of the kind of XLS documents that the product line processes. Hence, the generated tests are great, however not enough to perceive the behavior incompatibilities, because they are incapable to pass a XLS document in such format.

Tests

                                                                                               EvoSuite test

____________________________________________________________________________________________________

Evolution Step

 

In this evolution step, the developer changes a method responsible for extracting text between parentheses. Before the change, if the string passed as parameter has not parentheses, the method returns an empty string. After the evolution, when this happens the method returns the string passed as a parameter itself. Therefore, it is clear that there is behavior incompatibility between these methods. The behavior observed by the user will not be the same after the evolution. Below, we show the generated tests by IC-Randoop and IC-Evosuite, respectively. EIC could not perceive the behavior difference between source and target methods. This evolution requires generating test inputs that satisfy the properties of the kind of XML documents that the program processes. Hence, the generated tests are great, however not enough to perceive the fault because they are incapable to pass a XML document in such format. This XML represents a list of Phone Documents.

Tests

                                                                                                Randopp test

                                                                                             Evosuite test

____________________________________________________________________________________________________

Evolution Step

 

 

In this evolution step, the developer modifies the constructor of the class VerbTerm. All strings passed as parameter to the constructor are assigned to the class properties. However, after the evolution these strings are changed to lower case. This code modification altered the behavior of other class methods. IC-Randoop, IC-Evosuite and EIC-Evosuite could identify them. Below we show the generated tests that exposes the behavior incompatibilities.

Tests

                                                                                                          IC-Randoop test

                                                                                                             IC-Evosuite tests          

                                                                                                                EIC-Evosuite

The evolution steps demonstrated in the dissertation and here in the online appendix are representative enough. Therefore, it is not necessary to present all evolution steps investigated in our experiment.

____________________________________________________________________________________________________

Below, we provide the links to all the tools we use in our work.

Randoop

 

https://code.google.com/p/randoop/

 

Evosuite

 

http://www.evosuite.org/

 

SafeRefactor

 

http://www.dsc.ufcg.edu.br/~spg/saferefactor/

 

The TaRGeT source code is available at SPG SVN repository. See the following link:

 

https://svn.cin.ufpe.br/svn/cinspg/TaRGeT/

 

The MobileMedia source code is available at Source Forge. See the following link:

 

http://sourceforge.net/projects/mobilemedia/

 

Approaches Implementation

 

https://github.com/JeffersonAlmeida/SafeEvolution