Challenges in Analysing Executables: Scalability, Self-Modifying Code and Synergy
( 09. Jun – 13. Jun, 2014 )
- Roberto Giacobazzi (University of Verona, IT)
- Axel Simon (TU München, DE)
- Sarah Zennou (Airbus Group - Suresnes, FR)
- Susanne Bach-Bernhard (für administrative Fragen)
The seminar "Challenges in Analyzing Executables: Scalability, Self-Modifying Code and Synergy" addresses the analysis of executable code and unites people from a multitude of backgrounds such as auditing, verification, transformation, malware detection and other areas. The analysis of executables becomes increasingly popular as it poses new challenges to the academic world and addresses a pressing need in industry. The seminar is motivated by the earlier Dagstuhl seminar 12051 and addresses three major challenges, namely: the scalability of analyses, the ability to handle self-modifying code and how to create synergy between different communities by combining each other's analyses to create more powerful tools.
The translation from byte sequences that represent the code of a program to the instruction semantics poses particular scalability issues over the analysis of the high-level source code: A single line of source code translates to several assembler instructions. Each instruction, in turn, is then translated to a semantic description. This description is usually expressed by a small intermediate language (IL) that requires the effects of a single assembler instruction to be expressed using several IL statements. Overall, a single line of high-level code may turn into tens of IL instructions that have to be analyzed. Other, more subtle, forms of performance issues exist. Identifying and addressing these issues is the "Scalability" challenge.
In contrast to a source code analysis, elementary program concepts such as the control flow graph, loops, local variables, stack frames of functions, etc. are no longer available and therefore have to be recovered from the code. For instance, the reconstruction of the control flow graph (CFG) is non-trivial as instructions may change during execution and jumps and calls to computed addresses can only be resolved by estimating which values the computation may yield. The latter requires both a model of code that may change its shape and structure at run-time as well as a value analysis which is commonly expressed as a fix-point computation on the control-flow graph. This chicken-and-egg problem has been addressed by several authors but more challenges are as-of-yet unresolved: many programs, especially malicious ones, contain themselves interpreters, JIT compilers, and more generic forms of code generation that blurs the concept of the code that is to be analyzed. We call this challenge "Self-Modifying Code".
The previous Dagstuhl Seminar 12051 on the analysis of executable code brought together researchers and practitioners working on executable programs. Many participants were surprised by the diversity of tasks that can only or best be addressed at the binary level. Some of these tasks were: the verification of worst-case execution time, proving the absence of run-time errors, reverse engineering of legacy software for code re-use, identifying which security issues are addressed by a software update by performing graph matching on the CFG of the previous and the new version, summarizing sequences of basic blocks for better analysis speed and precision, devising techniques to manage integer overflows. Indeed, the seminar brought together several research and industrial communities that face common problems. One vision of this new seminar is to create "Synergy" and collaboration between these communities by asking how a tool of one community can be used in the context of another community.
As a follow-up on the previous Dagstuhl Seminar 12051 on the analysis of binaries, the interest in attending this new seminar was very high. In the end, less than half the people that we considered inviting could attend, namely 44 people. In contrast to the previous seminar that ran for 5 days, this seminar was a four-day seminar due to a bank holiday Monday. Having arranged the talks by topic, these four days split into two days on the analysis of binaries and into (nearly) two days on obfuscation techniques.
The challenges in the realm of general binary analysis have not changed considerably since the last gathering. However, new analysis ideas and new technologies (e.g. SMT solving) continuously advance the state-of-the-art and the presentations where a reflection thereon. With an even greater participation of people from industry, the participants could enjoy a broader view of the problems and opportunities that occur in practice. Given the tight focus on binary code (rather than e.g. Java byte code), a more detailed and informed discussion ensued. Indeed, the different groups seem to focus less on promoting their own tools rather than seeking collaboration and an exchange of experiences and approaches. In this light, the seminar met its ambition on synergy. It became clear that creating synergy by combining various tools is nothing that can be achieved in the context of a Dagstuhl Seminar. However, the collaborative mood and the interaction between various groups give hope that this will be a follow-on effect.
The second strand that crystallized during the seminar was the practical and theoretic interest in code obfuscation. Here, malware creators and analysts play an ongoing cat-and-mouse game. A theoretic understanding of the impossibility of winning the game in favor of the analysts helps the search for analyses that are effective on present-day obfuscations. In practice, a full understanding of some obfuscated code may be unobtainable, but a classification is still possible and useful. The variety of possible obfuscations creates many orthogonal directions of research. Indeed, it was suggested to hold a Dagstuhl Seminar on the sole topic of obfuscation.
One tangible outcome of the previous Dagstuhl Seminar is our GDSL toolkit that was presented by Julian Kranz. We believe that other collaborations will ensue from this Dagstuhl Seminar, as the feedback was again very positive and many and long discussions where held in the beautiful surroundings of the Dagstuhl grounds. The following abstracts therefore do not reflect on the community feeling that this seminar created. Please note that not all people who presented have submitted their abstracts due to the sensitive nature of the content and/or the organization that the participants work for.
- Davide Balzarotti (EURECOM - Biot, FR) [dblp]
- Sébastien Bardin (CEA LIST, FR) [dblp]
- Frédéric Besson (IRISA - Rennes, FR) [dblp]
- Sandrine Blazy (IRISA - Rennes, FR) [dblp]
- Juan Caballero (IMDEA Software - Madrid, ES) [dblp]
- Lorenzo Cavallaro (RHUL - London, GB) [dblp]
- Aziem Chawdhary (University of Kent, GB) [dblp]
- Cory Cohen (Software Engineering Institute - Pittsburgh, US) [dblp]
- Mila Dalla Preda (University of Verona, IT) [dblp]
- Bjorn De Sutter (Ghent University, BE) [dblp]
- Saumya K. Debray (University of Arizona - Tucson, US) [dblp]
- David Delmas (Airbus S.A.S. - Toulouse, FR) [dblp]
- Thomas Dullien (Google Switzerland, CH) [dblp]
- Emmanuel Fleury (University of Bordeaux, FR) [dblp]
- Anthony Fox (University of Cambridge, GB) [dblp]
- Roberto Giacobazzi (University of Verona, IT) [dblp]
- Kathryn E. Gray (University of Cambridge, GB) [dblp]
- Paul Irofti (Bucharest, RO) [dblp]
- Yan Ivnitskiy (Trail of Bits Inc. - New York, US) [dblp]
- Andy M. King (University of Kent, GB) [dblp]
- Tim Kornau-von Bock und Polach (Google Switzerland, CH) [dblp]
- Julian Kranz (TU München, DE) [dblp]
- Colas Le Guernic (Direction Generale de l'Armement, FR) [dblp]
- Junghee Lim (GrammaTech Inc. - Ithaca, US) [dblp]
- Alexey Loginov (GrammaTech Inc. - Ithaca, US) [dblp]
- Federico Maggi (Polytechnic University of Milan, IT) [dblp]
- Jean-Yves Marion (LORIA - Nancy, FR) [dblp]
- Florian Martin (AbsInt - Saarbrücken, DE) [dblp]
- Isabella Mastroeni (University of Verona, IT) [dblp]
- Bogdan Mihaila (TU München, DE) [dblp]
- Magnus Myreen (University of Cambridge, GB) [dblp]
- Gerald Point (University of Bordeaux, FR) [dblp]
- Edward Robbins (University of Kent, GB) [dblp]
- Bastian Schlich (ABB AG Forschungszentrum Deutschland - Ladenburg, DE) [dblp]
- Alexander Sepp (TU München, DE) [dblp]
- Axel Simon (TU München, DE) [dblp]
- Aditya Thakur (University of Wisconsin - Madison, US) [dblp]
- Axel Tillequin (Airbus Group - Suresnes, FR)
- Franck Védrine (CEA - Gif sur Yvette, FR) [dblp]
- Aymeric Vincent (University of Bordeaux, FR) [dblp]
- Xueguang Wu (TU München, DE) [dblp]
- Brecht Wyseur (NAGRA Kudelski Group SA - Cheseaux, CH) [dblp]
- Stefano Zanero (Polytechnic University of Milan, IT) [dblp]
- Sarah Zennou (Airbus Group - Suresnes, FR) [dblp]
- programming languages / compiler
- semantics / formal methods
- verification / logic
- executable analysis
- reverse engineering
- self-modifying code
- malware analysis