Comparing
Strategies for Improving Precision When
Checking
Safe Evolution of Software Product Lines
Jefferson Almeida – jra@cin.ufpe.br
Given
a range of commits where we truly have a product line, this script randomly
selects a revision within this interval. After that, the script ensures that
the selected revision has not been nominated before
and creates a temporary branch to execute the experiment, which we didactically
call, the source product line. On the
second step, the script selects another revision three commits above of the
first one and creates the second branch, which we didactically call the target product line. To do that, we use
the SVN log file with all commits during TaRGeT development.
____________________________________________________________________________________________________
We
repeated five times the execution of each approach for each evolution step. In
our results section, we only show the performance data of the Chart05, because the performance results are very
similar in all executions. However, below we provide all charts containing
performance outcomes.
____________________________________________________________________________________________________
EvoSuite includes
the total length of a test suite as a secondary optimization goal. As the
stopping conditions are based on coverage achievement,
EvoSuite therefore minimizes test
suites as a post-processing step. This feature enhances test readability and
performance, because the test suite is smaller and accordingly the techniques
spend less time to compile and run tests.
In
our experimental sample, Randoop
generated test suite size 99.93% bigger than EvoSuite.
Testes per second in average |
|
Randoop |
37 |
EvoSuite |
0.02 |
To
confirm these results we conducted a Wilcoxon statistical test. Below, we
present a table with the outcomes.
There
is enough evidence that Randoop
has a test suite size larger than the EvoSuite.
Alternative Hypothesis |
Wilcoxon rank sum test with continuity correction |
IC - Randoop is greater than IC - EvoSuite |
W = 1067.5, p-value = 4.529e-08 |
IC - Randoop is greater than EIC - EvoSuite |
W = 1056.5, p-value = 9.193e-08 |
EIC - Randoop
] is greater than IC - EvoSuite |
W = 1112, p-value = 2.211e-09 |
EIC - Randoop
] is greater than EIC - EvoSuite |
W = 1111, p-value = 2.375e-09 |
95% of
Confidence Level
____________________________________________________________________________________________________
Now, we present and detail some TaRGeT evolution steps not described in our result section.
Evolution
Step
In
this evolution, the developer modifies a method responsible for comparing the
similarity between the test cases generated by TaRGeT
product line. As you can see in
the figure below, the developer removes a code branch changing its behavior. To
be able to perceive the behavior change, our approaches need to generate a test
to exercise this removed branch. When running the same test in the method after
evolution, which no longer has this branch certainly change behavior will be exposed. For this evolution, only IC-EvoSuite
was able to identify the behavior difference. IC-Randoop generated tests that covered a
significant amount of branches, but it was incapable to notice the behavioral
incompatibility, because it could not create an oracle test to attest the
method behavior. Due to the fact that Randoop does
not directly aim at code coverage and do not have a heuristic to guide its
search for tests. The tests generated by IC-Randoop and IC-EvoSuite
are described below, respectively. EIC was incapable to perceive the behavior
change with EvoSuite
nor with Randoop,
because the extended impacted classes has many forwards dependencies and the
time available for both was insufficient to exercise all these dependencies in
order to realize the behavioral divergence between source and target methods.
Tests
Randoop test
EvoSuite test
____________________________________________________________________________________________________
Evolution
Step
In
this scenario, we have two evolutions in methods that receive a string as a
parameter and returns another string. These methods have several conditional
structures that are equal true depending on the string format passed as
parameter. Therefore, to cover all
branches and perceive behavior incompatibilities, our approaches need to
generate specific string
format to be used as input to the method. This issue leads to the
problem of String input generation as we explain in our dissertation. As Evosuite is driven by branch
coverage, it switches the generated string used for parameters several times
until cover the required branch or exceed the available time for test
generation. That is the reason why IC- Evosuite could identify the behavioral change. Below,
we provide the tests produced by Evosuite, which exposed the behavior divergence. In
contrast, we cannot say the same for Randoop. Its best-generated strings were not enough to
perceive the changes. Nevertheless, notice that we are not saying that Evosuite is
better than Randoop
regarding String input generation. Both do not treat the problem well, however
in this subject Evosuite
done better than Randoop.
On the other hand, EIC could not perceive the behavior difference between source and target methods. This evolution requires generating test inputs that
satisfy the properties of the kind of XLS
documents that the product line processes. Hence, the generated tests are
great, however not enough to perceive the behavior incompatibilities, because
they are incapable to pass a XLS
document in such format.
Tests
EvoSuite
test
____________________________________________________________________________________________________
Evolution
Step
In
this evolution step, the developer changes a method responsible for extracting
text between parentheses. Before the change, if the string passed as parameter
has not parentheses, the method returns an empty string. After the evolution,
when this happens the method returns the string passed as a parameter itself.
Therefore, it is clear that there is behavior incompatibility between these
methods. The behavior observed by the user will not be the same after the
evolution. Below, we show the generated tests by IC-Randoop
and IC-Evosuite, respectively. EIC could not perceive
the behavior difference between source
and target methods. This evolution
requires generating test inputs that satisfy the properties of the kind of XML documents that the program
processes. Hence, the generated tests are great, however not enough to perceive
the fault because they are incapable to pass a XML document in such format. This XML represents a list of Phone Documents.
Tests
Randopp test
Evosuite test
____________________________________________________________________________________________________
Evolution
Step
In
this evolution step, the developer modifies the constructor of the class VerbTerm. All
strings passed as parameter to the constructor are assigned
to the class properties. However, after the evolution these strings are changed to lower case. This code modification altered the
behavior of other class methods. IC-Randoop, IC-Evosuite and EIC-Evosuite could identify them. Below we show the generated
tests that exposes the behavior incompatibilities.
Tests
IC-Randoop test
IC-Evosuite tests
EIC-Evosuite
The
evolution steps demonstrated in the dissertation and here in the online
appendix are representative enough. Therefore, it is not necessary to present all
evolution steps investigated in our experiment.
____________________________________________________________________________________________________
Below, we provide the links to all the
tools we use in our work.
Randoop
https://code.google.com/p/randoop/
Evosuite
SafeRefactor
http://www.dsc.ufcg.edu.br/~spg/saferefactor/
The TaRGeT source code is available
at SPG SVN repository. See the following link:
https://svn.cin.ufpe.br/svn/
The MobileMedia source code is
available at Source Forge. See the following link:
http://sourceforge.net/projects/mobilemedia/
Approaches
Implementation
https://github.com/JeffersonAlmeida/SafeEvolution