Precise Symbolic State Merging
Why we need nondeterministic value summaries for state merging, and a time dimension to reconstruct execution traces
Summary
State merging is a standard technique for mitigating path explosion in symbolic execution. Nondeterministic value summaries are guarded sets of values with overlapping guards. Without those, we cannot perform fully precise state merging in all situations (do-while loops, unstructured code, nondeterministic programs). Reconstructing symbolic execution traces from states merged with nondeterministic value summaries requires some notion of time or origin.
Table of Contents
Symbolic Execution and Path Explosion
Testing and debugging are techniques appreciated by any reasonable programmer. They help to rule out major problems in an implementation and require little specialized knowledge. After the project runs for most inputs and passes the test suite, those techniques, however, reach their limits, as they can only ever show the presence, but not the absence of bugs; in other words: It’s likely you miss some border case that can make your program crash or more generally fail.
Automatic test case generation and, quite prominently, fuzzing techniques help to cover many missed inputs without user assistance. This is still testing, though. In general, it’s still possible (and for complex programs quite likely) that some equivalence class of inputs remains untested.
This is where static analysis enters the stage. If high confidence levels into
the program under test need to be established, the program has to be analyzed
independently of concrete inputs. One such analysis technique is Symbolic
Execution. The principles are simple: Instead of calling a method with
concrete inputs like 17 and 42, call it with symbolic inputs like
Symbolic execution maintains Symbolic (Execution) States consisting of a path
condition updated by case distinctions and a symbolic store updated by
assignments. Starting from an empty state i = 17;
updates this state to
if (k>0) {
i=2*k;
} else {
i=42;
}
induces a case distinction (since the truth value of the symbolic guard is
unknown at this point) and produces two output states:
Stopping Explosion: Symbolic State Merging
From there on, these branches are traditionally followed independently, forming a tree structure. Any code after the if statement is thus executed twice, which leads to an exponential blow-up1. A well-known approach to avoid this explosion is Symbolic State Merging2 3, which brings back together branches after a split. The resulting data structure resembles the CFG of the executed code although usually, cycles are broken up by introducing abstraction (we get a directed acyclic graph).
For example, after the if statement above, we know that i attains the value
Using conditional expressions for state merging is an example of fully
precise state merging. Alternatively, one could use data abstractions as in
Abstract Interpretation5. For example, since k is positive in the then
branch, we know that the final value of i after the if statement is also
positive. A possible, abstracted symbolic state would thus be
The Limits of Conditionals
The strength of symbolic execution is its high precision. Ideally, one would only introduce abstractions if really necessary (as for loop invariants). Furthermore, merging with conditional expressions works fully automatically, while abstract domains for data abstractions have to be chosen upfront.
There is however one limitation to using conditional expressions. Consider the following code with a loop:
do {
i--;
} while (i > 0)
Assume that we want to repeatedly execute the loop symbolically, maybe with
the aim to discover an invariant. When first entering the do loop, we’re in
the symbolic state
Can we merge those states using conditionals?
An empty path condition is equivalent to truth (the state is reached by all paths), so the result when merging the first and the second states is
This is, though, equivalent to
This cannot be what we want, and does of course not meet soundness criteria of symbolic execution (“exhaustiveness” and “precision”4). Sound state merging is always symmetric (and associative). Indeed, the merge method using conditional expressions may only be applied if the input states are “distinguishable”4, i.e., if only ever one path condition can hold true at the same time.
Design Options for Precise Merging
Of course, we could resort to abstraction—or use a different kind of
summation operator. Value Summaries have been proposed by Sen et
al.2 for precise symbolic state merging. A value summary is a guarded
collection of values:
This restriction, however, is not necessary. The full strength of summaries
becomes apparent when relaxing them to Nondeterministic Value Summaries: If
more than one guard is satisfiable by an initial assignment, the summary can
attain either of the guarded values. In the extreme case, nondeterministic
value summaries are sets:
Applied to our loop example from above, we can summarize the initial state and the state after one iteration to
or equivalently
using a nondeterministic value summary.
This strategy is symmetric and sound. It can be applied without restrictions, and is yet equivalent to merging with conditionals when path conditions are distinguishable. Furthermore, repeated merging yields “flatter” terms which are nicer to look at.
It is not strictly required to add value summaries as a new data type for right-hand sides of symbolic stores; alternatively, they could be encoded using fresh constants constrained by path conditions4, similarly to how we proceeded for data abstraction above. The downsides of this approach are, among others, the introduction of the constants, which can be troublesome for some constraint solvers, and the fact that path conditions no longer only consist of decisions taken during execution, but also of “artificial” constraints on the store.
The Problem with the Traces and the Time Dimension
Yet, how can I call this nondeterministic summary “fully precise”? Indeed,
when regarding the resulting state in isolation, we do not know if i attains
its original value or that value decreased by one, even if we know that the
initial value is strictly positive. The notion “precise” stems from the fact
that exactly all concrete states represented by the merged inputs are still
represented by the output with the summary. So, the “problem” is a different
one: If we’re in the (symbolic) initial state where
By adding a “time dimension”, we can rescue this capability. Assume that a variable t tracks the number of times a statement has been the execution target. Adding this to the guards of the value summary yields
The resulting states are distinguishable again, such that we even could use a conditional for merging. To reconstruct symbolic traces when using state merging with nondeterministic value summaries, we—generally—need some (abstract) notion of time. Usually, a properly used loop counter suffices.
Implementations of Value Summaries
Value summaries (used deterministically) have been proposed2 and implemented6 by Sen et al. The implementation does not seem to be maintained any more. More recently, value summaries are used for fully precise, automatic symbolic state merging by the pluggabl tool7. pluggabl is a symbolic execution engine for JVM bytecode which automatically performs state merging at all merge points, aligning the analysis result to the CFG structure.
Precise merging with conditional expressions has been implemented in the KeY tool8. State merging in KeY is not done automatically, but requires either point-and-click interaction with the symbolic execution tree or merge annotations in source code.
Conclusion
Value summaries are guarded sets of values. In deterministic value summaries, all constraints are mutually exclusive (“distinguishable”); nondeterministic summaries allow for overlapping guards. They are needed in certain situations, like do-while loops, where path conditions of input states are not necessarily distinguishable. This is also useful for unstructured programming languages or nondeterministic code. To be able to perform fully precise symbolic state merging in all situations, nondeterministic value summaries are indispensable, be it as special data types or encoded into path conditions. Their nondeterministic nature only becomes a problem if we want to be able to reconstruct execution traces from merged states. Then, we need a notion of time (for deterministic programs) or some general notion of origin (for nondeterministic programs with several entry points).
-
Cristian Cadar, Koushik Sen (2013): Symbolic Execution for Software Testing: Three Decades Later. Commun. ACM 56(2), 82–90 ↩︎
-
Koushik Sen, George C. Necula, Liang Gong, Wontae Choi (2015): MultiSE: Multi-Path Symbolic Execution Using Value Summaries. ESEC/SIGSOFT FSE: 842-853 ↩︎
-
A General Lattice Model for Merging Symbolic Execution Branches. Proc. 18th Intern. Conf. on Formal Engineering Methods (ICFEM).(2016). ↩︎
-
Ever Change a Running System: Structured Software Reengineering Using Automatically Proven-Correct Transformation Rules. Ernst Denert Award for Software Engineering 2020: Practice Meets Foundations.(2020). ↩︎
-
Patrick Cousot, Radhia Cousot (1977): Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. POPL 1977: 238-252 ↩︎
-
https://www.key-project.org/2017/05/03/new-feature-state-merging-in-key/ ↩︎