Grammar-based fuzzers are highly efficient in producing syntactically valid system inputs. However, as context-free grammars cannot capture semantic input features, generated inputs will often be rejected as semantically invalid by a target program. We propose ISLa, a declarative specification language for context-sensitive properties of structured system inputs based on context-free grammars. With ISLa, it is possible to specify input constraints like “a variable has to be defined before it is used,” “the length of the ‘file name’ block in a TAR file has to equal 100 bytes,” or “the number of columns in all CSV rows must be identical.”
ISLa constraints can be used for parsing or validation (“Does an input meet the expected constraint?”) as well as for fuzzing, since we provide both an evaluation and input generation component. ISLa embeds SMT formulas as an island language, leveraging the power of modern solvers like Z3 to solve atomic semantic constraints. On top, it adds universal and existential quantifiers over the structure of derivation trees from a grammar, and structural (“X occurs before Y”) and semantic (“X is the checksum of Y”) predicates.
ISLa constraints can be specified manually, but also mined from existing input samples. For this, our ISLearn prototype uses a catalog of common patterns (such as the ones above), instantiates these over input elements, and retains those candidates that hold for the inputs observed and whose instantiations are fully accepted by input-processing programs. The resulting constraints can then again be used for fuzzing and parsing.
In our evaluation, we show that a few ISLa constraints suffice to produce inputs that are 100% semantically valid while still maintaining input diversity. Furthermore, we confirm that ISLearn mines useful constraints about definition-use relationships and (implications between) the existence of “magic constants”, e.g., for programming languages and network packets.