This Dagstuhl Seminar addresses open challenges in developing a holistic, generally applicable methodology and tool support to model and evaluate the strength of software protections as defenses against man-at-the-end attacks such as reverse-engineering and software tampering. Such a methodology and supporting tools are necessary to (partially) automate the selection and deployment of techniques that protect the confidentiality and integrity of various types of assets embedded in software.
The seminar unites academic and industrial experts from multiple backgrounds such as hacking and penetration testing, experimental software engineering, software complexity theory, game theory, machine learning, formal modeling, software analysis (tools), software protection, (white-box) cryptography, and information security economics. Bringing these experts together is necessary to ensure that the proposed models and evaluation techniques can be experimentally validated, that they can be automated in a user-friendly, trustworthy, and scalable manner, and that they cover a wide range of realworld attack methods, available protections, assets, and security requirements.
Metrics Some qualitative and quantitative, measurable features of software have been put forward to evaluate relevant aspects of software protections, such as the potency, resilience, and stealth of obfuscations. Most proposed metrics are ad hoc, however. They are not validated against real-world attacks and are not generally applicable. It is necessary to develop a wider set of metrics that can be validated, that can be evaluated by supporting tools, and that cover the widest possible range of protection techniques and attacks. Such metrics and supporting tools can facilitate decision support, but also provide a standardized toolbox for researchers to demonstrate the value of novel protections they develop.
Attack and protection modeling Knowledge bases, attack graphs, Petri Nets, ontologies, and relational models have been proposed to reason about attacks and protections' impact thereon. The feasibility to automatically generate relevant, complete model instances for real-world use cases has not been demonstrated, however. Neither is it clear what the best abstraction level is to let the models support automated comparisons and selections of protections. The seminar aims for refining, extending, and augmenting modeling and reasoning approaches, borrowing from other security domains and adversarial settings where possible, to enable quantitative and qualitative assessment of the impact of protections on the relevant attack paths on concrete software assets.
Decision Support Limited approaches have been presented for automating the selection of protections given concrete assets and their security requirements, available protections, and software development lifecycle requirements. Those approaches are anything but ready for real-world deployment. For example, they currently cannot handle synergies or composability issues of protection combinations well. In this seminar, we will combine existing knowledge in this area with expertise in game theory, machine learning, and security economics to develop new protection design space exploration approaches that build on the aforementioned metrics, models, and reasoning approaches to assist defenders of assets.
Validation To provide trustworthy decision support, the underlying metrics and models need to be validated. Validation experiments (controlled or not) are expensive, however, and extremely hard to design and execute correctly. Moreover, it is hard to extrapolate from pointwise validation samples. In this seminar, we will combine experience in penetration testing, in experimental software engineering and in the industrial use of protection techniques to draft best practices for validation experiments and a roadmap for useful validation strategies to set us on a path towards trustworthy decision support.
Overview and Motivation
The area of Man-At-The-End (MATE) software protection is an evolving battlefield on which attackers execute white-box attacks: They control the devices and environments and use a range of tools to inspect, analyze, and alter software and its assets. Their tools include disassemblers, code browsers, debuggers, emulators, instrumentation tools, fuzzers, symbolic execution engines, customized OS features, pattern matchers, etc. To meet the security requirements of assets embedded in software, i.e., valuable data and code, many protections need to be composed. Those requirements include the confidentially of secret keys and software IP (novel algorithms, novel deep learning models, ...), and the integrity of license checking code and anti-copy protections. Attackers attack them through reverse engineering and tampering, for which they use the aforementioned tools and for which they often can afford spending time and effort on executing many, highly complex and time-consuming, manual and automated analyses. The need for composing many protections follows from the fact that advanced attackers can use all the mentioned tools and try many different approaches. In other words, to be effective, the deployed protections need to protect against all possible attack vectors.
As all protections come with overhead, and as many of them have downsides that complicate various aspects of the software development life cycle (SDLC), the users of a software protection tool cannot simply deploy all available protections. Instead, they have to select the protections and their parameters for every single asset in a program, taking into account non-functional requirements for the whole program and its SDLC.
The organizers of this workshop, and many experts in their network, consider the lack of automated decision support for selecting the best protections, and the lack of a generally accepted, broadly applicable methodology to evaluate and quantify the strength of a selected combination, the biggest challenges in the domain of software protection. As a result, the deployment of software protection is most often not trustworthy, error-prone, not measurable, and extremely expensive because experts are needed and they need a lot of time, increasing the time to market.
This situation is becoming ever more problematic. For example, connected intelligent vehicles are quickly being deployed in the market now and autonomous vehicles are going to be deployed in 3-5 years. Software protection evaluation and measurement research and development must match up that pace to provide enough technology support for controllable and scientific methods to manage the quality of automotive security as key part of vehicle reliability and safety. There is hence a huge need to make progress w.r.t. software protection decision support and evaluation methodologies, the topic of the proposed seminar.
Goals of the Seminar
Following a pre-seminar survey among the registered participants to focus the seminar and to select the highest priority objectives among the many possible ones, the primary goal of the seminar was determined to be the foundations of a white paper on software protection evaluation methodologies, to be used as a best practices guideline by researchers and practitioners when they evaluate (combinations of) defensive and/or offensive techniques in the domain of MATE software protection. This can also serve as a guideline to reviewers of submitted journal and conference papers in which novel techniques are proposed and evaluated. A secondary goal was the establishment of good benchmarking practices, including the choice of suitable benchmarks and the selection and generation thereof for use in future research in MATE software protection. A third goal was to collect feedback and ideas on how to push the state of the art in decision support systems.
Prior to the seminar, the organizers set up a survey to collect the necessary information for a seminar bundle that provided background information about and to all participants. Moreover, they collected information regarding the potential outcomes that participants were most interested in, to which ones they could likely contribute, and which potential outcomes they considered most likely to make progress on. Furthermore, a reading list was presented to the participants with the goal of getting everyone on the same page as much and as soon as possible [1--8].
Whereas the schedule for the first two days was mostly fixed a priori, the schedule for later days was more dynamic, as it was adapted to the feedback obtained by the organizers during the early days, and to the outcomes of different sessions.
The first day was devoted to setting the scope of the seminar, and clarifying the seminar goals, strategy, and plan. In the morning, three overviews were presented of man-at-the-end software protection techniques in the scope of the seminar, as well as some attacks on them. These presentations focused on obfuscation vs. static analysis, (anti-)tampering in online games, and additional protections beyond the ones discussed in the first two presentations.
In the early afternoon, four deeper technical introductions were presented of four more concrete classes of defensive and corresponding offensive techniques that would serve as case studies throughout the seminar: 1) virtual machine obfuscation, 2) (anti-)disassembly, 3) trace semantics based attacks, and 4) data obfuscation. The strategy for the week was to brainstorm about these concrete techniques first, in particular on how the strength of these techniques are supposed to be evaluated, e.g., in papers that present novel (combinations of) techniques, or in penetration tests. Later, the concrete results for the individual case studies would then be generalized into best practices and guidelines for software protection evaluation methodologies.
Whereas the morning presentations and most of the case studies focused mostly on defensive techniques, three presentations in the afternoon provided complementary insights about offensive techniques, ranging from more academic semantics-based attack techniques, over an industrial case study of deobfuscation of compile-time obfuscation, and offensive techniques in binary analysis.
Thus, the scene was set in terms of both defensive and offensive techniques, and all participants to a large degree spoke the same language before starting the brainstorm sessions in the rest of the week.
Tuesday focused mostly on the seminar track of software protection evaluation methodologies.
In the early morning, additional input was provided on existing, already studied aspects relevant to such methodologies. This included software protection metrics, empirical experiments to assess protections, and security economics. These presentations provided useful hooks for the next session, which consisted of parallel, small break-out brainstorm sessions (three groups per case study) on the first two case studies. In these brainstorm sessions, the goal was to provide answers to questions such as the following:
- What would a document similar to the SIGPLAN empirical evaluation checklist look like for papers presenting new VM-based protections?
- Which requirements or recommendations can we put forward with respect to the protected objects (i.e., benchmarks) and their treatment (i.e., how they are created, compiled, ...) for the evaluation?
- What aspects of the attack models and which assumptions should be made explicit, which ones should be justified, e.g., regarding attacker goals and attacker activities.
- How should sensitivity to different inputs (e.g., random generator seeds, configuration options, features of code samples, ...) be evaluated and discussed?
- What threats to validity should be discussed? item What aspects of the protection should be evaluated (potency, resilience, learnability, usability, stealth, renewability, different forms of costs, ...)?
- Under what conditions would you consider the protection to be ``real world'' applicable? item What flaws (e.g., unrealistic assumptions) have you seen in existing papers that should be avoided?
- What are (minimal) requirements / recommendations regarding reproducibility?
- What pitfalls can you list that we should share with people?
After the independent brainstorms in small groups and following lunch, the three groups per case study came together to merge the results of their brainstorms, after which the merged results were shared in a plenary session.
Later in the afternoon, additional ideas were presented on topics relevant for software protection evaluation methodologies. The covered topics were benchmark generation, security activities in protected software product life cycles, the resilience of software integrity protection (work in progress), and a (unified) measure theory for potency. These topics were presented after the initial brainstorms not to bias those brainstorms. Their nature was more forward looking, covering a number of open challenges as well as potential directions for future research. They offered the speakers a sound board to get feedback and could serve as the starting point of informal discussions later in the seminar.
While the practice is discouraged by the Dagstuhl administration, we still decided to organize an evening session on Tuesday. Afterwards, we realized that this made the seminar a bit too dense, but it did serve the useful purpose of introducing the participants to the seminar track on decision support tools for software protection early enough in the seminar to allow enough time for informal discussions with and between researchers active on this topic during the remainder of the week. This was especially useful to allow those academic researchers to check the validity of some of their assumptions about real-world aspects with the present practitioners from industry and with researchers from other domains.
Besides an overview of an existing design and implementation of a software protection decision support system, a hands-on walk through of a practical attack on a virtual machine protection (as in one of the case studies) was presented, as well as some ideas to make such protection stronger.
Early on Wednesday morning, the focus shifted towards decision support tools, with three presentations by practitioners in companies that provide software protection solutions. These presentations focused on the support they provide to help their customers use their tools.
Later in the morning, case studies 3 and 4 were discussed in another round of parallel, small group break-out brainstorm sessions.
In the afternoon, the social outing took place, which consisted of a visit to Trier and a wine tasting at a winery where we also had dinner.
On Thursday morning, another round of break-out sessions was organized to structure the outcomes of the first round. Based on inputs collected during the first three days, the organizers drafted a structure for a white paper on software protection methodologies. In 4 parallel sessions, the participants brainstormed on how to fit the results of the first round (i.e., bullet points with concrete guidelines and considerations for each case study) into that structure, and which parts of those results could be generalized beyond the individual case studies. In a plenary session, the results of these break-outs were then presented.
In addition, the specific topic of benchmarking was discussed, focusing on questions regarding the required features of benchmarks (e.g., should or should they not contain actual security-sensitive assets) as well as potential strategies to get from the situation today, in which very few benchmarks used in papers are available for reproducing the results, to a situation in which a standard set of benchmarks is available and effectively used in studies.
In the afternoon, several demonstrations of practical tools were given, including the already mentioned decision support system of which the concepts had been presented on Tuesday evening and the Binary Ninja disassembler that is rapidly gaining popularity. Two presentations were also given on usable security and challenges and capabilities of modern static analysis of obfuscated code. There provided additional insights useful for both designers of decision support tools and evaluation methodologies.
The last morning started off with a potpourri of interesting topics that did not fit well in the main tracks of general evaluation methodologies and decision support on the one hand, and benchmarking on the other. Given the availability of many experts in the domain of software protection, we decided that everyone that wanted to launch new ideas or collect feedback on them in the broad domain of the seminar should have that chance. So the day started with short presentations on the protection of machine learning as a specific new type of application, on security levels for white-box cryptography, and on hardware/software binding using DRAM.
Later in the morning, the seminar was wrapped up with a discussion of the outcomes so far, and an agreement on plans to continue the work on the software protection evaluation methodology white paper and the assembly of a benchmark collection.
- S. Schrittwieser, S. Katzenbeisser, J.Kinder, G. Merzdovnik, and E. Weippl: Protecting software through obfuscation: Can it keep pace with progress in code analysis? ACM Comput. Surv., 49(1), 2016.
- M. Ceccato, P. Tonella P, C. Basile, P. Falcarin, M. Torchiano, B. Coppens, and B. De Sutter: Understanding the behaviour of hackers while performing attack tasks in a professional setting and in a public challenge. Empirical Software Engineering 2018; 24(1):240–286.
- B. Cataldo, D. Canavese, L. Regano, P. Falcarin, and B. De Sutter: A Meta-model for Software Protections and Reverse Engineering Attacks. Journal of Systems and Software 150 (April): 3–21, 2019
- B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray: A generic approach to automatic deobfuscation of executable code. In: Proc. IEEE Symposium on Security and Privacy, pp. 674–691 (2015)
- T. Blazytko, M. Contag, C. Aschermann, and T. Holz: Syntia: synthesizing the semantics of obfuscated code. Proc. of the 26th USENIX Security Symposium (SEC’17), pp. 643–659. 2017
- S. Banescu, C. Collberg, and A. Pretschner: Predicting the Resilience of Obfuscated Code Against Symbolic Execution Attacks via Machine Learning. Proc. of the 26th USENIX Conference on Security Symposium (SEC’17), pp. 661-678, 2017
- C. Basile et al.: D5.11 ASPIRE Framework Report. Technical Report ASPIRE project. https://aspire-fp7.eu/sites/default/files/D5.11-ASPIRE-Framework-Report.pdf
- M. Ceccato et al.: D4.06 ASPIRE Security Evaluation Methodology – Security Evaluation. Technical Report ASPIRE project. https://aspire-fp7.eu/sites/default/files/D4. 06-ASPIRE-Security-Evaluation-Methodology.pdf
- Mohsen Ahmadvand (TU München, DE) [dblp]
- Sébastien Bardin (CEA LIST, FR) [dblp]
- Cataldo Basile (Polytechnic University of Torino, IT) [dblp]
- Tim Blazytko (Ruhr-Universität Bochum, DE) [dblp]
- Richard Bonichon (CEA LIST - Nano-INNOV, FR) [dblp]
- Richard Clayton (University of Cambridge, GB) [dblp]
- Christian Collberg (University of Arizona - Tucson, US) [dblp]
- Moritz Contag (Ruhr-Universität Bochum, DE) [dblp]
- Bart Coppens (Ghent University, BE) [dblp]
- Jorge R. Cuéllar (Siemens AG - München, DE) [dblp]
- Mila Dalla Preda (University of Verona, IT) [dblp]
- Bjorn De Sutter (Ghent University, BE) [dblp]
- Laurent Dore (EDSI - Cesson-Sevigne, FR)
- Ninon Eyrolles (Paris, FR) [dblp]
- Roberto Giacobazzi (University of Verona, IT) [dblp]
- Yuan Xiang Gu (Irdeto - Ottawa, CA) [dblp]
- Christophe Hauser (USC - Marina del Rey, US) [dblp]
- Stefan Katzenbeisser (Universität Passau, DE) [dblp]
- Eric Lafortune (Guardsquare - Leuven, BE)
- Peter Lafosse (Vector 35 - Melbourne, US)
- Patrik Marcacci (Kudelski Security - Cheseaux, CH)
- J. Todd McDonald (University of South Alabama - Mobile, US) [dblp]
- Christian Mönch (Conax - Oslo, NO) [dblp]
- Leon Moonen (Simula Research Laboratory - Lysaker, NO) [dblp]
- Jan Newger (Google Switzerland - Zürich, CH)
- Katharina Pfeffer (SBA Research - Wien, DE) [dblp]
- Yannik Potdevin (Universität Kiel, DE) [dblp]
- Uwe Resas (QuBalt GmbH, DE)
- Rolf Rolles (Mobius Strip Reverse Engineering - San Francisco, US) [dblp]
- Sebastian Schrittwieser (FH - St. Pölten, AT) [dblp]
- Bahman Sistany (Irdeto - Ottawa, CA) [dblp]
- Natalia Stakhanova (University of Saskatchewan - Saskatoon, CA) [dblp]
- Atis Straujums (whiteCryption - Riga, LV)
- Stijn Volckaert (KU Leuven - Ghent, BE) [dblp]
- John Wagner (Vector 35 - Melbourne, US) [dblp]
- Andreas Weber (Gemalto - München, DE) [dblp]
- Brecht Wyseur (NAGRA Kudelski Group SA - Cheseaux, CH) [dblp]
- Michael Zunke (SFNT Germany GmbH - München, DE) [dblp]
- security / cryptology
- man-at-the-end attacks
- software protection
- predictive models
- reverse engineering and tampering