09.06.14 - 13.06.14, Seminar 14241

Challenges in Analysing Executables: Scalability, Self-Modifying Code and Synergy

The following text appeared on our web pages prior to the seminar, and was included as part of the invitation.


The seminar "Challenges in Analyzing Executables: Scalability, Self-Modifying Code and Synergy" addresses the analysis of executable code and unites people from a multitude of backgrounds such as auditing, verification, transformation, malware detection and other areas. The analysis of executables becomes increasingly popular as it poses new challenges to the academic world and addresses a pressing need in industry. The seminar is motivated by the earlier Dagstuhl seminar 12051 and addresses three major challenges, namely: the scalability of analyses, the ability to handle self-modifying code and how to create synergy between different communities by combining each other's analyses to create more powerful tools.


The translation from byte sequences that represent the code of a program to the instruction semantics poses particular scalability issues over the analysis of the high-level source code: A single line of source code translates to several assembler instructions. Each instruction, in turn, is then translated to a semantic description. This description is usually expressed by a small intermediate language (IL) that requires the effects of a single assembler instruction to be expressed using several IL statements. Overall, a single line of high-level code may turn into tens of IL instructions that have to be analyzed. Other, more subtle, forms of performance issues exist. Identifying and addressing these issues is the "Scalability" challenge.

Self-Modifying Code

In contrast to a source code analysis, elementary program concepts such as the control flow graph, loops, local variables, stack frames of functions, etc. are no longer available and therefore have to be recovered from the code. For instance, the reconstruction of the control flow graph (CFG) is non-trivial as instructions may change during execution and jumps and calls to computed addresses can only be resolved by estimating which values the computation may yield. The latter requires both a model of code that may change its shape and structure at run-time as well as a value analysis which is commonly expressed as a fix-point computation on the control-flow graph. This chicken-and-egg problem has been addressed by several authors but more challenges are as-of-yet unresolved: many programs, especially malicious ones, contain themselves interpreters, JIT compilers, and more generic forms of code generation that blurs the concept of the code that is to be analyzed. We call this challenge "Self-Modifying Code".


The previous Dagstuhl Seminar 12051 on the analysis of executable code brought together researchers and practitioners working on executable programs. Many participants were surprised by the diversity of tasks that can only or best be addressed at the binary level. Some of these tasks were: the verification of worst-case execution time, proving the absence of run-time errors, reverse engineering of legacy software for code re-use, identifying which security issues are addressed by a software update by performing graph matching on the CFG of the previous and the new version, summarizing sequences of basic blocks for better analysis speed and precision, devising techniques to manage integer overflows. Indeed, the seminar brought together several research and industrial communities that face common problems. One vision of this new seminar is to create "Synergy" and collaboration between these communities by asking how a tool of one community can be used in the context of another community.