August 11 – 16 , 2019, Dagstuhl Seminar 19331

Software Protection Decision Support and Evaluation Methodologies


Christian Collberg (University of Arizona – Tucson, US)
Mila Dalla Preda (University of Verona, IT)
Bjorn De Sutter (Ghent University, BE)
Brecht Wyseur (NAGRA Kudelski Group SA – Cheseaux, CH)

For support, please contact

Dagstuhl Service Team


Dagstuhl Report, Volume 9, Issue 8 Dagstuhl Report
Aims & Scope
List of Participants


Overview and Motivation

The area of Man-At-The-End (MATE) software protection is an evolving battlefield on which attackers execute white-box attacks: They control the devices and environments and use a range of tools to inspect, analyze, and alter software and its assets. Their tools include disassemblers, code browsers, debuggers, emulators, instrumentation tools, fuzzers, symbolic execution engines, customized OS features, pattern matchers, etc. To meet the security requirements of assets embedded in software, i.e., valuable data and code, many protections need to be composed. Those requirements include the confidentially of secret keys and software IP (novel algorithms, novel deep learning models, ...), and the integrity of license checking code and anti-copy protections. Attackers attack them through reverse engineering and tampering, for which they use the aforementioned tools and for which they often can afford spending time and effort on executing many, highly complex and time-consuming, manual and automated analyses. The need for composing many protections follows from the fact that advanced attackers can use all the mentioned tools and try many different approaches. In other words, to be effective, the deployed protections need to protect against all possible attack vectors.

As all protections come with overhead, and as many of them have downsides that complicate various aspects of the software development life cycle (SDLC), the users of a software protection tool cannot simply deploy all available protections. Instead, they have to select the protections and their parameters for every single asset in a program, taking into account non-functional requirements for the whole program and its SDLC.

The organizers of this workshop, and many experts in their network, consider the lack of automated decision support for selecting the best protections, and the lack of a generally accepted, broadly applicable methodology to evaluate and quantify the strength of a selected combination, the biggest challenges in the domain of software protection. As a result, the deployment of software protection is most often not trustworthy, error-prone, not measurable, and extremely expensive because experts are needed and they need a lot of time, increasing the time to market.

This situation is becoming ever more problematic. For example, connected intelligent vehicles are quickly being deployed in the market now and autonomous vehicles are going to be deployed in 3-5 years. Software protection evaluation and measurement research and development must match up that pace to provide enough technology support for controllable and scientific methods to manage the quality of automotive security as key part of vehicle reliability and safety. There is hence a huge need to make progress w.r.t. software protection decision support and evaluation methodologies, the topic of the proposed seminar.

Goals of the Seminar

Following a pre-seminar survey among the registered participants to focus the seminar and to select the highest priority objectives among the many possible ones, the primary goal of the seminar was determined to be the foundations of a white paper on software protection evaluation methodologies, to be used as a best practices guideline by researchers and practitioners when they evaluate (combinations of) defensive and/or offensive techniques in the domain of MATE software protection. This can also serve as a guideline to reviewers of submitted journal and conference papers in which novel techniques are proposed and evaluated. A secondary goal was the establishment of good benchmarking practices, including the choice of suitable benchmarks and the selection and generation thereof for use in future research in MATE software protection. A third goal was to collect feedback and ideas on how to push the state of the art in decision support systems.

Week Overview


Prior to the seminar, the organizers set up a survey to collect the necessary information for a seminar bundle that provided background information about and to all participants. Moreover, they collected information regarding the potential outcomes that participants were most interested in, to which ones they could likely contribute, and which potential outcomes they considered most likely to make progress on. Furthermore, a reading list was presented to the participants with the goal of getting everyone on the same page as much and as soon as possible [1--8].

Whereas the schedule for the first two days was mostly fixed a priori, the schedule for later days was more dynamic, as it was adapted to the feedback obtained by the organizers during the early days, and to the outcomes of different sessions.


The first day was devoted to setting the scope of the seminar, and clarifying the seminar goals, strategy, and plan. In the morning, three overviews were presented of man-at-the-end software protection techniques in the scope of the seminar, as well as some attacks on them. These presentations focused on obfuscation vs. static analysis, (anti-)tampering in online games, and additional protections beyond the ones discussed in the first two presentations.

In the early afternoon, four deeper technical introductions were presented of four more concrete classes of defensive and corresponding offensive techniques that would serve as case studies throughout the seminar: 1) virtual machine obfuscation, 2) (anti-)disassembly, 3) trace semantics based attacks, and 4) data obfuscation. The strategy for the week was to brainstorm about these concrete techniques first, in particular on how the strength of these techniques are supposed to be evaluated, e.g., in papers that present novel (combinations of) techniques, or in penetration tests. Later, the concrete results for the individual case studies would then be generalized into best practices and guidelines for software protection evaluation methodologies.

Whereas the morning presentations and most of the case studies focused mostly on defensive techniques, three presentations in the afternoon provided complementary insights about offensive techniques, ranging from more academic semantics-based attack techniques, over an industrial case study of deobfuscation of compile-time obfuscation, and offensive techniques in binary analysis.

Thus, the scene was set in terms of both defensive and offensive techniques, and all participants to a large degree spoke the same language before starting the brainstorm sessions in the rest of the week.


Tuesday focused mostly on the seminar track of software protection evaluation methodologies.

In the early morning, additional input was provided on existing, already studied aspects relevant to such methodologies. This included software protection metrics, empirical experiments to assess protections, and security economics. These presentations provided useful hooks for the next session, which consisted of parallel, small break-out brainstorm sessions (three groups per case study) on the first two case studies. In these brainstorm sessions, the goal was to provide answers to questions such as the following:

  • What would a document similar to the SIGPLAN empirical evaluation checklist look like for papers presenting new VM-based protections?
  • Which requirements or recommendations can we put forward with respect to the protected objects (i.e., benchmarks) and their treatment (i.e., how they are created, compiled, ...) for the evaluation?
  • What aspects of the attack models and which assumptions should be made explicit, which ones should be justified, e.g., regarding attacker goals and attacker activities.
  • How should sensitivity to different inputs (e.g., random generator seeds, configuration options, features of code samples, ...) be evaluated and discussed?
  • What threats to validity should be discussed? item What aspects of the protection should be evaluated (potency, resilience, learnability, usability, stealth, renewability, different forms of costs, ...)?
  • Under what conditions would you consider the protection to be ``real world'' applicable? item What flaws (e.g., unrealistic assumptions) have you seen in existing papers that should be avoided?
  • What are (minimal) requirements / recommendations regarding reproducibility?
  • What pitfalls can you list that we should share with people?

After the independent brainstorms in small groups and following lunch, the three groups per case study came together to merge the results of their brainstorms, after which the merged results were shared in a plenary session.

Later in the afternoon, additional ideas were presented on topics relevant for software protection evaluation methodologies. The covered topics were benchmark generation, security activities in protected software product life cycles, the resilience of software integrity protection (work in progress), and a (unified) measure theory for potency. These topics were presented after the initial brainstorms not to bias those brainstorms. Their nature was more forward looking, covering a number of open challenges as well as potential directions for future research. They offered the speakers a sound board to get feedback and could serve as the starting point of informal discussions later in the seminar.

While the practice is discouraged by the Dagstuhl administration, we still decided to organize an evening session on Tuesday. Afterwards, we realized that this made the seminar a bit too dense, but it did serve the useful purpose of introducing the participants to the seminar track on decision support tools for software protection early enough in the seminar to allow enough time for informal discussions with and between researchers active on this topic during the remainder of the week. This was especially useful to allow those academic researchers to check the validity of some of their assumptions about real-world aspects with the present practitioners from industry and with researchers from other domains.

Besides an overview of an existing design and implementation of a software protection decision support system, a hands-on walk through of a practical attack on a virtual machine protection (as in one of the case studies) was presented, as well as some ideas to make such protection stronger.


Early on Wednesday morning, the focus shifted towards decision support tools, with three presentations by practitioners in companies that provide software protection solutions. These presentations focused on the support they provide to help their customers use their tools.

Later in the morning, case studies 3 and 4 were discussed in another round of parallel, small group break-out brainstorm sessions.

In the afternoon, the social outing took place, which consisted of a visit to Trier and a wine tasting at a winery where we also had dinner.


On Thursday morning, another round of break-out sessions was organized to structure the outcomes of the first round. Based on inputs collected during the first three days, the organizers drafted a structure for a white paper on software protection methodologies. In 4 parallel sessions, the participants brainstormed on how to fit the results of the first round (i.e., bullet points with concrete guidelines and considerations for each case study) into that structure, and which parts of those results could be generalized beyond the individual case studies. In a plenary session, the results of these break-outs were then presented.

In addition, the specific topic of benchmarking was discussed, focusing on questions regarding the required features of benchmarks (e.g., should or should they not contain actual security-sensitive assets) as well as potential strategies to get from the situation today, in which very few benchmarks used in papers are available for reproducing the results, to a situation in which a standard set of benchmarks is available and effectively used in studies.

In the afternoon, several demonstrations of practical tools were given, including the already mentioned decision support system of which the concepts had been presented on Tuesday evening and the Binary Ninja disassembler that is rapidly gaining popularity. Two presentations were also given on usable security and challenges and capabilities of modern static analysis of obfuscated code. There provided additional insights useful for both designers of decision support tools and evaluation methodologies.


The last morning started off with a potpourri of interesting topics that did not fit well in the main tracks of general evaluation methodologies and decision support on the one hand, and benchmarking on the other. Given the availability of many experts in the domain of software protection, we decided that everyone that wanted to launch new ideas or collect feedback on them in the broad domain of the seminar should have that chance. So the day started with short presentations on the protection of machine learning as a specific new type of application, on security levels for white-box cryptography, and on hardware/software binding using DRAM.

Later in the morning, the seminar was wrapped up with a discussion of the outcomes so far, and an agreement on plans to continue the work on the software protection evaluation methodology white paper and the assembly of a benchmark collection.


  1. S. Schrittwieser, S. Katzenbeisser, J.Kinder, G. Merzdovnik, and E. Weippl: Protecting software through obfuscation: Can it keep pace with progress in code analysis? ACM Comput. Surv., 49(1), 2016.
  2. M. Ceccato, P. Tonella P, C. Basile, P. Falcarin, M. Torchiano, B. Coppens, and B. De Sutter: Understanding the behaviour of hackers while performing attack tasks in a professional setting and in a public challenge. Empirical Software Engineering 2018; 24(1):240–286.
  3. B. Cataldo, D. Canavese, L. Regano, P. Falcarin, and B. De Sutter: A Meta-model for Software Protections and Reverse Engineering Attacks. Journal of Systems and Software 150 (April): 3–21, 2019
  4. B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray: A generic approach to automatic deobfuscation of executable code. In: Proc. IEEE Symposium on Security and Privacy, pp. 674–691 (2015)
  5. T. Blazytko, M. Contag, C. Aschermann, and T. Holz: Syntia: synthesizing the semantics of obfuscated code. Proc. of the 26th USENIX Security Symposium (SEC’17), pp. 643–659. 2017
  6. S. Banescu, C. Collberg, and A. Pretschner: Predicting the Resilience of Obfuscated Code Against Symbolic Execution Attacks via Machine Learning. Proc. of the 26th USENIX Conference on Security Symposium (SEC’17), pp. 661-678, 2017
  7. C. Basile et al.: D5.11 ASPIRE Framework Report. Technical Report ASPIRE project.
  8. M. Ceccato et al.: D4.06 ASPIRE Security Evaluation Methodology – Security Evaluation. Technical Report ASPIRE project. 06-ASPIRE-Security-Evaluation-Methodology.pdf
Summary text license
  Creative Commons BY 3.0 Unported license
  Christian Collberg, Mila Dalla Preda, Bjorn De Sutter, and Brecht Wyseur


  • Security / Cryptology


  • Man-at-the-end attacks
  • Software protection
  • Predictive models
  • Metrics
  • Reverse engineering and tampering


In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.


Download overview leaflet (PDF).

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.


Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.