Accepted Research Papers

Tuesday, 24th, 11:00-12:30, R1: Best Paper session

Session Chair: Bojan Cukic 

Chizpurfle: A Gray-Box Android Fuzzer for Vendor Service Customizations. Antonio Ken Iannillo, Roberto Natella, Domenico Cotroneo and Cristina Nita-Rotaru.

Reflection Analysis for Java: Uncovering More Reflective Targets Precisely. Jie Liu, Yue Li, Tian Tan and Jingling Xue.

GEMS: An Extract Method Refactoring Recommender. Sihan Xu, Aishwarya Sivaraman, Siau Cheng Khoo and Jing Xu.

Tuesday, 24th, 02:00-03:30, R2: Modelling

Session Chair: Peter Popov

A Generalized Bivariate Modeling Framework of Fault Detection and Correction Processes. Hiroyuki Okamura and Tadashi Dohi.

A stochastic modeling approach for an efficient dependability evaluation of large systems with non-anonymous interconnected components. Giulio Masetti, Silvano Chiaradonna and Felicita Di Giandomenico.

Understanding the Impacts of Influencing Factors on Time to a DataRace Software Failure. Kun Qiu, Zheng Zheng, Kishor Trivedi and Beibei Yin.

Tuesday, 24th, 02:00-03:30, R3:  Faults and Failures Analysis (1)

Session Chair: Marco Vieira

An Exploratory Study of Field Failures. Luca Gazzola, Leonardo Mariani, Fabrizio Pastore and Mauro Pezze'.

Learning from Imbalanced Data for Predicting the Number of Software Defects. Xiao Yu, Jin Liu, Zijiang Yang and Xiangyang Jia.

A fault correlation approach to detect performance anomalies in Virtual Network Function chains. Domenico Cotroneo, Roberto Natella and Stefano Rosiello.

Tuesday, 24th, 04:00-05:30, R4: WAP and PER

Session Chair: Alexander Romanovsky

Experience Report: On the Impact of Software Faults in the Privileged Virtual Machine. Frederico Cerveira, Raul Barbosa and Henrique Madeira.

WAP: SAT-based Computation of Minimal Cut Sets. Weilin Luo and Ou Wei.

Experience Report: Security Vulnerability Profiles of Mission Critical Software: Empirical Analysis of Security Related Bug Reports. Katerina Goseva-Popstojanova and Jacob Tyo.

(WAP) Does Reviewers’ Age Affect the Performance of Code Review? Yukasa Murakami, Masateru Tsunoda and Hidetake Uwano.

Tuesday, 24th, 04:00-05:30, R5: Faults and Failures Analysis (2)

Session Chair: Elena Troubitsyna

Experience report: Fault Triggers in Linux Operating System: From Evolution Perspective. Guanping Xiao, Zheng Zheng, Beibei Yin and Kishor Trivedi.

Simultaneous Fault Injection for the Generation of Efficient Error Detection Mechanisms. Matthew Leeke.

Which Packages Would be Affected by This Bug Report? Qiao Huang, David Lo, Xin Xia, Qingye Wang and Shanping Li.

Wednesday, 25th, 02:00-03:30, R6: Dynamic and Static Analysis

Session Chair: Jérémie Guiochet

Practical Evaluation of Static Code Analysis Tools for Cryptography: Benchmarking Method and Case Study. Alexandre Braga, Ricardo Dahab, Nuno Antunes, Nuno Laranjeiro and Marco Vieira.

Interactive Runtime Verification - When Interactive Debugging Meets Runtime Verification. Raphaël Jakse, Ylies Falcone, Jean-François Mehaut and Kevin Pouget.

Formal Development of Policing Functions for Intelligent Systems. Chris Bogdiukiewicz, Michael Butler, Thai Son Hoang, Martin Paxton, James Snook, Xanthippe Waldron and Toby Wilkinson.

Wednesday, 25th, 02:00-03:30, R7: Security Modeling and Empirical Studies

Session Chair: Katinka Wolter

Experience report: Study of Vulnerabilities of Enterprise Operating Systems. Anatoliy Gorbenko, Alexander Romanovsky, Olga Tarasyuk and Oleksandr Biloborodov.

Software Metrics as Indicators of Security Vulnerabilities. Nádia Medeiros, Naghmeh Ivaki, Pedro Costa and Marco Vieira.

Models of reliability of fault-tolerant software under cyber-attacks. Peter Popov.

Wednesday, 25th, 04:00-05:30, R8: Security Assessment and Quality Assurance

Session Chair: Domenico Cotroneo

Experience report: How to Design Web-Based Competitions for Legal Proceedings: Lessons from a Court Case. Aad Van Moorsel, Matthew Forshaw and Francisco Rocha.

Perman: Fine-grained Permission Management for Android Applications. Jiaojiao Fu, Yangfan Zhou, Yu Kang, Huan Liu and Xin Wang.

AQAT: The Architecture Quality Assurance Tool for Critical Embedded Systems. Andreas Johnsen, Kristina Lundqvist, Kaj Hänninen and Paul Pettersson.

Wednesday, 25th, 04:00-05:30, R9: PER: Reliability

Session Chair: Michael Lyu

Experience Report: Evaluating Fault Detection Effectiveness and Resource Efficiency of the Architecture Quality Assurance Framework and Tool. Andreas Johnsen, Kristina Lundqvist, Kaj Hänninen, Paul Pettersson and Martin Torelm.

Experience Report: Log-based Behavioral Differencing. Maayan Goldstein, Dan Raz and Itai Segall.

Experience Report: Verifying MPI Java Programs using Software Model Checking. Muhammad Sohaib Ayub, Waqas Ur Rehman and Junaid Haroon Siddiqui.

Thursday, 26th, 09:00-10:30, R10: Testing

Session Chair: Karama Kanoun

FSM-based Testing : An Empirical Study on Complete Round-Trip Paths Versus Transition Trees Testing. Hoda Khalil and Yvan Labiche.

Cocoon: Crowdsourced Testing Quality Maximization Under Context Coverage Constraint. Miao Xie, Qing Wang, Guowei Yang and Mingshu Li.

Toward rigorous object-code coverage criteria. Taejoon Byun, Vaibhav Sharma, Sanjai Rayadurgam, Stephen McCamant and Mats Heimdahl.

Thursday, 26th, 011:00-12:30, R11: Machine Learning for Reliability and Security

Session Chair: Mohamed Kaaniche

Automatically Repairing Web Application Firewalls Based on Successful SQL Injection Attacks. Dennis Appelt, Annibale Panichella and Lionel Briand.

Log Mining using Natural Language Processing: Application to Stress Detection. Christophe Bertero, Matthieu Roy, Carla Sauvanaud and Gilles Tredan.

Learning Feature Representations from Change Dependency Graphs for Defect Prediction. Pablo Loyola and Yutaka Matsuo.


Abstract of accepted research papers


 

Aad Van Moorsel, Matthew Forshaw and Francisco Rocha.

How to Design Web-Based Competitions for Legal Proceedings: Lessons from a Court Case

Abstract: In this practical experience report we discuss a court case in which one of the authors was expert witness. The court case considered possible fraud in an online game, with participants being denied prizes because they were considered to have cheated. The discussion in this paper aims to provide a practice-led perspective on the link between technology and legal issues in the design of online games and web applications. The paper discusses the various questions the expert witness was asked to address and provides a synopsis of analysis of data in the web server log file presented to court. The paper reflects on ensuing issues at the intersection of legal and technical concerns. Based on the insights gained, we present guidelines for the design of online competitions and for client-server web applications implementing the game. The aim of this paper is not to discuss the specifics, merit or outcomes of the case, but to report on a practical expert witness case and to distill lessons to be learned for the design of online games and web applications.


Anatoliy Gorbenko, Alexander Romanovsky, Olga Tarasyuk and Oleksandr Biloborodov.

Study of Vulnerabilities of Enterprise Operating Systems

Abstract: This paper analyses security problems of modern computer systems caused by vulnerabilities in their operating systems. An aggregated vulnerability database has been developed by joining vulnerability records from two publicly available vulnerability databases: the Common Vulnerabilities and Exposures system (CVE) and the National Vulnerabilities database (NVD). The aggregated data allow us to investigate the stages of the vulnerability life cycle, vulnerability disclosure and the elimination statistics for different operating systems. The specific technical areas the paper covers are the quantitative assessment of vulnerabilities discovered and fixed in operating systems, the estimation of time that vendors spend on patch issuing, and the analysis of the vulnerability criticality and identification of vulnerabilities common for different operating systems.


Frederico Cerveira, Raul Barbosa and Henrique Madeira.

Experience Report: On the Impact of Software Faults in the Privileged Virtual Machine

Abstract: Cloud computing is revolutionizing how organizations treat computing resources. The privileged virtual machine is a key component in systems that use virtualization, but poses a dependability risk for several reasons. The activation of residual software faults that exist in every software project is a real threat and can impact the correct operation of the entire virtualized system. To study this question, we begin by performing a detailed analysis of the privileged virtual machine and its components, followed by software fault injection campaigns that target two of those important components -- toolstack and a device driver. The obstacles faced during this experimental phase and how they were overcome is herein described with practitioners in mind. The results show that software faults in those components can have either no impact or lead to drastic failures, showing that the privileged virtual machine is a single point of failure that must be protected (for 9.1% of the faults). Most of the failures are detectable by monitoring basic functionalities, but some faults caused inconsistent states that manifest later on. No silent data failures (SDF) have been observed, but the number of faults injected so far only allows to conclude that SDF are not very frequent.


Alexandre Braga, Ricardo Dahab, Nuno Antunes, Nuno Laranjeiro and Marco Vieira.

Practical Evaluation of Static Code Analysis Tools for Cryptography: Benchmarking Method and Case Study

Abstract: The incorrect use of cryptography is a common source of critical software vulnerabilities. As developers lack knowledge in applied cryptography and support from experts is scarce, this situation is frequently addressed by adopting static code analysis tools to automatically detect cryptography misuse during coding and reviews, even if the effectiveness of such tools is far from being well understood.  This paper proposes a method for benchmarking static code analysis tools for the detection of cryptography misuse, and evaluates the method in a case study, with the goal of selecting the most adequate tools for specific development contexts. Our method classifies cryptography misuse in nine categories recognized by developers (weak cryptography, poor key management, bad randomness, etc.) and provides the workload, metrics and procedure needed for a fair assessment and comparison of tools. We found that all evaluated tools together detected only 35\% of cryptography misuses in our tests. Furthermore, none of the evaluated tools detected insecure elliptic curves, weak parameters in key agreement, and most insecure configurations for RSA and ECDSA. This suggests cryptography misuse is underestimated by tool builders. Despite that, we show it is possible to benefit from an adequate tool selection during the development of cryptographic software.


Luca Gazzola, Leonardo Mariani, Fabrizio Pastore and Mauro Pezze'.

An Exploratory Study of Field Failures

Abstract: Field failures, that is, failures caused by faults that escape the testing phase leading to failures in the field, are unavoidable. Improving verification and validation activities before deployment can identify and timely remove many but not all faults, and users may still experience a number of annoying problems while using their software systems. This paper investigates the nature of field failures, to understand to what extent further improving in-house verification and validation activities can reduce the number of failures in the field, and frames the need of new approaches that operate in the field.We report the results of the analysis of the bug reports of five applications belonging to three different ecosystems, propose a taxonomy of field failures, and discuss the reasons why failures belonging to the identified classes cannot be detected at design time but shall be addressed at runtime. We observe that many faults (70%) are intrinsically hard to detect at design-time


Weilin Luo and Ou Wei.

WAP: SAT-based Computation of Minimal Cut Sets

Abstract: Fault tree analysis (FTA) is a prominent reliability analysis method widely used in safety-critical industries. Computing minimal cuts sets (MCSs), i.e., finding all the smallest combination of basic events that result in the top level event, plays a fundamental role in FTA. Classical methods have been proposed based on manipulation of boolean expressions of fault trees and Binary Decision Diagrams. However, given the inherent intractability of computing MCSs, developing new methods over different paradigms remains to be an interesting research direction. In this paper, motivated by recent progress on modern SAT solver, we present a new method for computing MCSs based on SAT solving. Specifically, given a fault tree, we iteratively search for a cut set based on the DPLL framework. By exploiting local failure propagation paths in the fault tree, we provide efficient algorithms for extracting an MCS from the cut set. The information of a new MCS is learned as a blocking clause for SAT solving, which helps to prune search space and ensures completeness of the results. We compare our method with a popular commercial FTA tool on practical fault trees. Preliminary results show that our method exhibits better performance on time and memory usage.


Xiao Yu, Jin Liu, Zijiang Yang and Xiangyang Jia.

Learning from Imbalanced Data for Predicting the Number of Software Defects

Abstract: Predicting the number of defects in software modules can be more helpful in the case of limited testing resources. The highly imbalanced distribution of the target variable values (i.e., the number of defects) degrades the performance of models for predicting the number of defects. As the first effort of an in-depth study, this paper explores the potential of using resampling techniques and ensemble learning techniques to learn from imbalanced defect data for predicting the number of defects. We study the use of two extended resampling strategies (i.e., SMOTE and RUS) for regression problem and an ensemble learning technique (i.e., the AdaBoost.R2 algorithm) to handle imbalanced defect data for predicting the number of defects. We refer to the extension of SMOTE and RUS for predicting the Number of Defects as SmoteND and RusND, respectively. Experimental results on 6 datasets with two performance measures show that these approaches are effective in handling imbalanced defect data. To further improve the performance of these approaches, we propose two novel hybrid resampling/boosting algorithms, called SmoteNDBoost and RusNDBoost, which introduce SmoteND and RusND into the AdaBoost.R2 algorithm, respectively. Experimental results show that SmoteNDBoost and RusNDBoost both outperform their individual components (i.e., SmoteND, RusND and AdaBoost.R2).


Hiroyuki Okamura and Tadashi Dohi.

A Generalized Bivariate Modeling Framework of Fault Detection and Correction Processes

Abstract: This paper presents a generalized modeling framework of fault detection and correction processes with bivariate distributions. The presented framework includes almost all existing software reliability growth models (SRGMs) handling both fault detection and correction processes. In our framework, the time dependency of fault correction time corresponds to the correlation between fault detection and correction times. Moreover, we propose a new fault detection and correction process model with hyper-Erlang distributions, and develop the model parameter estimation algorithm via EM (expectation-maximization) algorithm. In numerical examples, we demonstrate the data fitting ability of hyper-Erlang model with actual fault detection and correction data of open source projects.


Nádia Medeiros, Naghmeh Ivaki, Pedro Costa and Marco Vieira.

Software Metrics as Indicators of Security Vulnerabilities

Abstract: Detecting software security vulnerabilities and distinguishing vulnerable from non-vulnerable code is anything but simple. Most of the time, vulnerabilities remain undisclosed until they are exposed, for instance, by an attack during the software operational phase. Software metrics are widely-used indicators of software quality, but the question is whether they can be used to distinguish vulnerable software units from the non-vulnerable ones during development. In this paper, we perform an exploratory study on software metrics, their interdependency, and their relation with security vulnerabilities. We aim at understanding: i) the correlation between software architectural characteristics, represented in the form of software metrics, and the number of vulnerabilities; and ii) which are the most informative and discriminative metrics that allow identifying vulnerable units of code. To achieve these goals, we use, respectively, correlation coefficients and heuristic search techniques. Our analysis is carried out on a dataset that includes metrics and reported security vulnerabilities, exposed by security attacks, for all functions, classes, and files of five widely used projects. Results show: i) a strong correlation between several project-level metrics and the number of vulnerabilities, ii) the possibility of using a group of metrics, at both file and function levels, to distinguish vulnerable and non-vulnerable code with a high level of accuracy.


Muhammad Sohaib Ayub, Waqas Ur Rehman and Junaid Haroon Siddiqui.

Experience Report: Verifying MPI Java Programs using Software Model Checking

Abstract: Parallel and distributed computing have enabled development of much more scalable software. However, developing concurrent software requires the programmer to be aware of non-determinism, data races, and deadlocks. MPI (message passing interface) is a popular standard for writing message-oriented distributed applications. Some messages in MPI systems can be processed by one of the many machines and in many possible orders. This non-determinism can affect the result of an MPI application. The alternate results may or may not be correct. To verify MPI applications, we need to check all these possible orderings and use an application specific oracle to decide if these orderings give correct output. MPJ Express is an open source Java implementation of the MPI standard. Model checking of MPI Java programs is a challenging task due to their parallel nature. We developed a Java based model of MPJ Express, where processes are modeled as threads, and which can run unmodified MPI Java programs on a single system. This model enabled us to adapt the Java PathFinder explicit state software model checker (JPF) using a custom listener to verify our model running real MPI Java programs. The evaluation of our approach shows that model checking reveals incorrect system behavior that results in very intricate message orderings.


Maayan Goldstein, Dan Raz and Itai Segall.

Experience Report: Log-based Behavioral Differencing

Abstract: Monitoring systems and ensuring the required service level is an important operation task. However, doing this based on external  visible data, such as systems logs, is very difficult since it is very hard to extract from the logged data the exact state and the root cause to the actions taken by the system. Yet,  identifying behavioral changes of complex system can be used for early identification of problems and allow proactive correction measurements. Since it is practically impossible to perform this task manually, there is a critical need for a methodology that can analyze logs, automatically create a behavioral model, and compare the behavior to the expected behavior.In this paper we propose a novel approach for comparison between service executions as exhibited in their log files. The behavior is captured by Finite State Automaton models (FSAs), enhanced with performance related data, both mined from the logs. Our tool then computes the difference between the current model and behavioral models created when the service was known to operate well. A visual framework that graphically presents and emphasizes the changes in the behavior is then used to trace their root cause. We evaluate our approach over real telecommunication logs.


Giulio Masetti, Silvano Chiaradonna and Felicita Di Giandomenico.

A stochastic modeling approach for an efficient dependability evaluation of large systems with non-anonymous interconnected components

Abstract: This paper addresses the generation of stochastic models for dependability and performability analysis of complex systems, through automatic replication of template models. The proposed solution is tailored to systems composed by large populations of similar non-anonymous components, interconnected with each other according to a variety topologies. A new efficient replication technique is presented and its implementation is discussed. The goal is to improve the performance of simulation solvers with respect to standard approaches, when employed in the modeling of the addressed class of systems, in particular for loosely interconnected system components (as typically encountered in the electrical or transportation sectors). Simulation results and time overheads induced by our new technique are presented and discussed for a representative case study.


Domenico Cotroneo, Roberto Natella and Stefano Rosiello.

A fault correlation approach to detect performance anomalies in Virtual Network Function chains

Abstract: Network Function Virtualization is an emerging paradigm to allow the creation, at software level, of complex network services by composing simpler ones. However, this paradigm shift exposes network services to faults and bottlenecks in the complex software virtualization infrastructure they rely on. Thus, NFV services require effective anomaly detection systems to detect the occurrence of network problems. The paper proposes a novel approach to ease the adoption of anomaly detection in production NFV services, by avoiding the need to train a model or to calibrate a threshold. The approach infers the service health status by collecting metrics from multiple elements in the NFV service chain, and by analyzing their (lack of) correlation over the time. We validate this approach on an NFV-oriented Interactive Multimedia System, to detect problems affecting the quality of service, such as the overload, component crashes, avalanche restarts and physical resource contention.


Kun Qiu, Zheng Zheng, Kishor Trivedi and Beibei Yin.

Understanding the Impacts of Influencing Factors on Time to a DataRace Software Failure

Abstract: Datarace is a common problem on shared-memory parallel computers, including multicores. Due to its dependence on the thread scheduling scheme of its execution environment, the time to a datarace failure is usually very long. How to accelerate the occurrence of a datarace failure and further estimate the mean time to failure (MTTF) is an important topic to be studied. In this paper, the influencing factors for failures triggered by datarace bugs are explored and their influences on the time to datarace failure including the relationship with the MTTF are empirically studied. Experiments are conducted on real datarace suffering programs to verify the factors and their influences. Empirical results show that the influencing factors do have influences on the time to datarace failure of the subjects. They can be used to accelerate the occurrence of datarace failures and accurately estimate the MTTF.


Antonio Ken Iannillo, Roberto Natella, Domenico Cotroneo and Cristina Nita-Rotaru.

Chizpurfle: A Gray-Box Android Fuzzer for Vendor Service Customizations

Abstract: Android has become the most popular mobile OS, as it enables device manufacturers to introduce customizations to compete with value-added services. However, customizations make the OS less dependable and secure, since they can introduce software flaws. Such flaws can be found by using fuzzing, a popular testing technique among security researchers. This paper presents Chizpurfle, a novel "gray-box" fuzzing tool for vendor-specific Android services. Testing these services is challenging for existing tools, since vendors do not provide source code and the services cannot be run on a device emulator. Chizpurfle has been designed to run on an unmodified Android OS on an actual device. The tool automatically discovers, fuzzes, and profiles proprietary services. This work evaluates the applicability and performance of Chizpurfle on the Samsung Galaxy S6 Edge, and discusses software bugs found in privileged vendor services.


Hoda Khalil and Yvan Labiche.

FSM-based Testing : An Empirical Study on Complete Round-Trip Paths Versus Transition Trees Testing

Abstract: Abstract—Finite state machines being intuitively understandable and suitable for modeling in many domains, are adopted by many software designers. Therefore, testing systems that are modeled with state machines has received genuine attention. Among the studied testing strategies are complete round-trip paths and transition trees that cover round-trip paths in a piece wise manner. We present an empirical study that aims at comparing the effectiveness of the complete round-trip paths test suites to the transition trees test suites in one hand, and comparing the effectiveness of the different techniques used to generate transition trees (breadth first traversal, depth first traversal, and random traversal) on the other hand. We also compare the effectiveness of all the testing trees generated using each single traversal criterion. This is done through conducting an empirical evaluation using four case studies from different domains. Effectiveness is evaluated with mutants. Experimental results are presented and analyzed.


Jie Liu, Yue Li, Tian Tan and Jingling Xue.

Reflection Analysis for Java: Uncovering More Reflective Targets Precisely

Abstract: Reflection, which is widely used in practice and abused by many security exploits, poses a significant obstacle to program analysis. Reflective calls can be analyzed statically or dynamically. Static analysis is more sound but also more imprecise (by introducing many false reflective targets and thus affecting its scalability). Dynamic analysis can be precise but often miss many true reflective targets due to low code coverage. We introduce MIRROR, the first automatic reflection analysis for Java that increases significantly the code coverage of dynamic analysis while keeping false reflective targets low. In its static analysis, a novel reflection-oriented slicing technique is applied to identify a small number of small path-based slices for a reflective call so that different reflective targets are likely exercised along these different paths. This preserves the soundness of pure static reflection analysis as much as possible, improves its scalability, and reduces substantially its false positive rate. In its dynamic analysis, these slices are executed with automatically generated test cases to report the reflective targets accessed. This significantly improves the code coverage of pure dynamic analysis. We evaluate MIRROR against a state-of-the-art dynamic reflection analysis tool, TAMIFLEX, by using 10 large real-world Java applications. MIRROR detects 12.5% – 933.3% more reflective targets efficiently (in 362.8 seconds on average) without producing any false positives. These new targets enable 5 – 174949 call graph edges to be reachable in the application code.


Miao Xie, Qing Wang, Guowei Yang and Mingshu Li.

Cocoon: Crowdsourced Testing Quality Maximization Under Context Coverage Constraint

Abstract: Mobile app testing is challenging since each test needs to be executed in a variety of operating contexts including heterogeneous devices, various wireless networks and different locations. Crowdsourcing enables a mobile app test to be distributed as a crowdsourced task to leverage crowd workers to accomplish the test. However, high test quality and expected test context coverage are difficult to achieve in crowdsourced testing. Upon distributing a test task, mobile app providers neither know who to participate nor predict whether all the expected test contexts can be covered in the task. To address this problem, we put forward a novel research problem called Crowdsourced Testing Quality Maximization Under Context Coverage Constraint (Cocoon). Given a mobile app test task, our objective is to recommend a set of workers, from available crowd workers, such that the expected test context coverage and a high test quality can be achieved. We prove that the Cocoon problem is NP-Complete and then introduce two greedy approaches. Based on a real dataset from the largest Chinese crowdsourced testing platform, our evaluation shows the effectiveness and efficiency of the two approaches, which can be potentially used as online services in practice.


Raphaël Jakse, Ylies Falcone, Jean-François Mehaut and Kevin Pouget.

Interactive Runtime Verification - When Interactive Debugging Meets Runtime Verification

Abstract: Runtime Verification consists in verifying programs at runtime, looking for input and output events to discover, check, or enforce behavioral properties. Interactive debugging consists in studying programs at runtime in order to discover and understand bugs and fix them, inspecting interactively their internal state. Interactive Runtime Verification (i-RV) adds the formal aspects of runtime verification to interactive debugging. We define an efficient and convenient way to automatically check behavioral properties on a program using a debugger without impacting its executions in undesirable ways. We define the notion of scenario that guides the developer in the debugging session and limits tedious aspects of debugging by automatically triggering actions and adding instrumentation when the state of the verified property changes. Combined to existing checkpointing techniques, i-RV facilitates exploring the execution, bug discovery, and leverages interactivity to understand and fix bugs. We implemented i-RV in a tool used to conduct experiments that validate the effectiveness of i-RV in different situations.


Dennis Appelt, Annibale Panichella and Lionel Briand.

Automatically Repairing Web Application Firewalls Based on Successful SQL Injection Attacks

Abstract: Testing and fixing WAFs are two relevant and complementary challenges for security analysts. Automated testing helps to cost-effectively detect vulnerabilities in a WAF by generating effective test cases, i.e., attacks. Once vulnerabilities have been identified, the WAF needs to be fixed by augmenting its rule set to filter attacks without blocking legitimate requests. However, existing research suggests that rule sets are very difficult to understand and too complex to be manually fixed. In this paper, we formalise the problem of fixing vulnerable WAFs as a combinatorial optimisation problem. To solve it, we propose an automated approach that combines machine learning with multi-objective genetic algorithms. Given a set of legitimate requests and bypassing SQL injection attacks, our approach automatically infers regular expressions that, when added to the WAF's rule set, prevent many attacks while letting legitimate requests go through. Our empirical evaluation based on both open-source and proprietary WAFs shows that the generated filter rules are effective at blocking previously identified and successful SQL injection attacks (recall between 54.6% and 98.3%), while triggering in most cases no or few false positives (false positive rate between 0% and 1%).


Sihan Xu, Aishwarya Sivaraman, Siau Cheng Khoo and Jing Xu.

GEMS: An Extract Method Refactoring Recommender

Abstract: Extract method is a widely used refactoring operation to improve method comprehension and maintenance. Much research has been done to identify extractable code fragment within another piece of code, such as method body to form a new method. Criteria used for identifying extractable code is usually centered around degrees of cohesiveness, coupling and length of the method. However, automatic method extraction techniques have not been highly successful, since it can be hard to concretize the criteria used in identifying extractable code. In this work, we present a novel system that learns these criteria for extract method refactoring opportunities from open source repositories. We encode the concepts of cohesion, coupling and other attributes as features in our learning model, and train it to extract suitable code fragments from a given source of a method. Our tool, GEMS, recommends a ranked list of code fragments with high accuracy and great speed. We evaluated our approach on several open source repositories and compared it against three state-of-the-art approaches, i.e., SEMI, JExtract and JDeodorant. The results on these open-source data shows the superiority of our machine-learning-based approach in terms of effectiveness. We develop GEMS as an Eclipse plugin, with the intention to support software reliability through method extraction.


Jiaojiao Fu, Yangfan Zhou, Yu Kang, Huan Liu and Xin Wang.

Perman: Fine-grained Permission Management for Android Applications

Abstract: Third-party libraries (3PLs) are widely introduced into Android apps and they typically request permissions for their own functionalities. Current Android systems manage permissions in process (app) granularity. Hence, the host app and the 3PLs share the same permission set. 3PL-apps may therefore introduce security risks. Separating the permission sets of the 3PLs and those of the host app are critical to alleviate such security risks. In this paper, we provide Perman, a tool that allows users to manage permissions of different modules ({\em i.e.}, a 3PL or the host app) of an app at runtime. Perman relies on dynamic code instrumentation to intercept permission requests, and accordingly provide a policy-based permission control. Unlike existing tools that generally require to redesign 3PL-apps, it can thus be applied to the existing apps in market. We evaluate Perman on real-world apps. The experiment results verify its effectiveness in fine-grained permission management.


Guanping Xiao, Zheng Zheng, Beibei Yin and Kishor Trivedi.

Fault Triggers in Linux Operating System: From Evolution Perspective

Abstract: Linux operating system is a complex system that is prone to suffer failures during usage, and increases difficulties of fixing bugs. Different testing strategies and fault mitigation methods can be developed and applied based on different types of bugs, which leads to the necessity to have a deep understanding of the nature of bugs in Linux. In this paper, an empirical study is carried out on 5741 bug reports of Linux kernel from an evolution perspective. A bug classification is conducted based on fault triggering conditions, followed by the analysis of the evolution of bug type proportions over versions and time, and their comparisons across versions, products and regression bugs. Moreover, the relationship between bug type proportions and clustering coefficient, as well as the relation between bug types and time to fix are presented. This paper reveals 13 interesting findings based on the empirical results and further provides guidance for developers and users based on these findings.


Katerina Goseva-Popstojanova and Jacob Tyo.

Experience Report: Security Vulnerability Profiles of Mission Critical Software: Empirical Analysis of Security Related Bug Reports

Abstract: While some prior research work exists on characteristics of software faults (i.e., bugs) and failures, very little work has been published on analysis of software application vulnerabilities. This paper aims to contribute towards filling that gap by presenting an empirical investigation of application vulnerabilities. The results are based on data extracted from issue tracking systems of two NASA missions. These data were organized in three datasets: Ground mission IV&V issues, Flight mission IV&V issues, and Flight mission Developers issues. In each dataset, we identified the security related software bugs and classified them in specific vulnerability classes. Then, we created the vulnerability profiles, i.e., determined where and when the security vulnerabilities were introduced and what are the dominating vulnerabilities classes. Our main findings include:(1) In IV&V issues datasets the majority of vulnerabilities were code related and were introduced in the Implementation phase. (2) For all datasets, around 90% of the vulnerabilities were located in two to four subsystems. (3) Out of 21 primary classes, five dominated: Exception Management, Memory Access, Other, Risky Values, and Unused Entities. Together, they contributed from around 80% to 90% of vulnerabilities in each dataset.


Andreas Johnsen, Kristina Lundqvist, Kaj Hänninen and Paul Pettersson.

AQAT: The Architecture Quality Assurance Tool for Critical Embedded Systems

Abstract: Architectural engineering of embedded systems comprehensively affects both the development processes and the abilities of the systems. Verification of architectural engineering is consequently essential in the development of safety- and mission-critical embedded system to avoid costly and hazardous faults. In this paper, we present the Architecture Quality Assurance Tool (AQAT), an application program developed to provide a holistic, formal, and automatic verification process for architectural engineering of critical embedded systems. AQAT includes architectural model checking, model-based testing, and selective regression verification features to effectively and efficiently detect design faults, implementation faults, and faults created by maintenance modifications. Furthermore, the tool includes a feature that analyzes architectural dependencies, which in addition to providing essential information for impact analyzes of architectural design changes may be used for hazard analysis, such as potential error propagations, common cause failures, and single point failures. Overviews of both the graphical user interface and the back-end processes of AQAT are presented with a sensor-to-actuator system example.


Andreas Johnsen, Kristina Lundqvist, Kaj Hänninen, Paul Pettersson and Martin Torelm.

Experience Report: Evaluating Fault Detection Effectiveness and Resource Efficiency of the Architecture Quality Assurance Framework and Tool

Abstract: The Architecture Quality Assurance Framework (AQAF) is a theory developed to provide a holistic and formal verification process for architecture engineering of critical embedded systems. AQAF encompasses integrated architecture model checking, model-based testing, and selective regression verification techniques to achieve this goal. The Architecture Quality Assurance Tool (AQAT) implements the theory of AQAF and enables automated application of the framework. In this paper, we present an evaluation of AQAT and the underlying AQAF theory by means of an industrial case study, where resource efficiency and fault detection effectiveness are the targeted properties of evaluation. The method of fault injection is utilized to guarantee coverage of fault types and to generate a data sample size adequate for statistical analysis. We discovered important areas of improvement in this study, which required further development of the framework before satisfactory results could be achieved. The final results present a 100% fault detection rate at the design level, a 98.5% fault detection rate at the implementation level, and an average increased efficiency of 6.4% with the aid of the selective regression verification technique.


Matthew Leeke.

Simultaneous Fault Injection for the Generation of Efficient Error Detection Mechanisms

Abstract: The application of machine learning to software fault injection data has been shown to be an effective approach for the generation of efficient error detection mechanisms (EDMs) at arbitrary locations. However, such approaches to the design of EDMs have invariably adopted a fault model with a single-fault assumption, limiting the practical relevance of the detectors and their evaluation. Software containing more than a single fault is commonplace, with prominent safety standards recognise that critical failures are often the result of unlikely or unforeseen combinations of faults. This paper addresses this shortcoming, demonstrating that it is possible to generate similarly efficient EDMs under more realistic fault models. In particular, it is shown that (i) efficient EDMs can be designed using fault data collected under models accounting for the occurrence of simultaneous faults, (ii) exhaustive fault injection under a simultaneous bit flip model can yield improvements to true EDM efficiency, and (iii) exhaustive fault injection under a simultaneous bit flip model can made non-exhaustive, thereby reducing the resource costs of experimentation to practicable levels, without sacrificing the efficiency of the resultant EDMs.


Taejoon Byun, Vaibhav Sharma, Sanjai Rayadurgam, Stephen McCamant and Mats Heimdahl.

Toward rigorous object-code coverage criteria

Abstract: Object-branch coverage (OBC) is often used as a measure of the thoroughness of tests suites, augmenting or substituting source-code based structural criteria such as branch coverage and modified condition/decision coverage (MC/DC). In addition, with the increasing use of third-party components for which source-code access may be unavailable, robust object-code coverage criteria are essential to assess how well the components are exercised during testing. While OBC has the advantage of being programming language independent and is amenable to non-intrusive coverage measurement techniques, variations in compilers, and the optimizations they perform can substantially change the structure of the generated code and the instructions used to represent branches. To address the need for a robust object coverage criterion, this paper proposes a rigorous definition of OBC that captures well the semantics of source code branches for a given instruction set architecture. We report an empirical assessment of these criteria for the Intel x86 instruction set on several examples from embedded control systems software. Preliminary results indicate that object-code coverage can be made robust to compilation variations and is comparable in its bug-finding efficacy to source level MC/DC.


Christophe Bertero, Matthieu Roy, Carla Sauvanaud and Gilles Tredan.

Log Mining using Natural Language Processing: Application to Stress Detection

Abstract: Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its  correct state and allows to diagnose problems. However, reading does  not scale: with the number of machines increasingly rising, and the complexification of systems, the task of auditing systems' health  based on logfiles is becoming overwhelming for system administrators. This observation led to many proposals automating the processing of logs. However, most of these proposal still require some human intervention, for instance by tagging logs, parsing the source files generating the logs, etc. In this work, we target minimal human intervention for logfile processing and propose a new approach that considers logs as regular text (as opposed to related works that seek to exploit at best the little structure imposed by log formatting). This approach allows to leverage modern techniques from natural language processing. More specifically, we first apply a word embedding technique based on Google's word2vec algorithm: logfiles' words are mapped to a high dimensional metric space, that we then exploit as a feature space using standard classifiers. The resulting pipeline is very generic, computationally efficient, and requires very little intervention. We validate our approach by seeking stress patterns on an experimental platform. Results show a strong predictive performance around 90% accuracy) using three out-of-the-box classifiers.


Pablo Loyola and Yutaka Matsuo.

Learning Feature Representations from Change Dependency Graphs for Defect Prediction

Abstract: Given the heterogeneity of the data that can be extracted from the software development process, defect prediction techniques have focused on associating different sources of data with the introduction of faulty code, usually relying on hand-crafted features. While these efforts have generated considerable progress over the years, little attention has been given to the fact  that the performance of any predictive model depends heavily on the representation of the data used, and that different representations can lead to different results. We consider this a relevant problem, as it could be affecting directly the efforts towards generating safer software systems. Therefore, we propose to study the impact of the representation of the data in defect prediction models. For this study, we focus on the use of developer activity data, from which we structure dependency graphs. Then, instead of manually generating features, such as network metrics, we propose two models inspired in recent advances in Representation Learning which are able to automatically generate representations from graph data. These new representations are compared against manually crafted features for defect prediction in real world software projects. Our results show that automatically learned features are competitive, reaching increments in prediction performance up to 13\%.


Peter Popov.

Models of reliability of fault-tolerant software under cyber-attacks

Abstract: This paper offers a new approach to modelling the effect of cyber-attacks on reliability of software used in industrial control applications. The model is based on the view that successful cyber-attacks introduce failure regions, which are not present in non-compromised software. The model is then extended to cover a fault tolerant architecture, such as the 1-out-of-2 software, popular for building industrial protection systems. The model is used to study the effectiveness of software maintenance policies such as patching and “cleansing” under different adversary models ranging from independent attacks to sophisticated synchronized attacks on the channels. We demonstrate that the effect of attacks on reliability of diverse software significantly depends on the adversary model. Under synchronized attacks system reliability may be more than an order of magnitude worse than under independent attacks on the channels. These findings, although not surprising, highlight the importance of using an adequate adversary model in the assessment of how effective various cyber-security controls are.


Qiao Huang, David Lo, Xin Xia, Qingye Wang and Shanping Li.

Which Packages Would be Affected by This Bug Report?

Abstract: A large project (e.g., Ubuntu) usually contains a large number of software packages. Sometimes the same bug report in such project would affect multiple packages, and developers of different packages need to collaborate with one another to fix the bug. Unfortunately, the total number of packages involved in a project like Ubuntu is relatively large, which makes it time-consuming to manually identify packages that are affected by a bug report. In this paper, we propose an approach named PkgRec that consists of 2 components: a name matching component and an ensemble learning component. In the name matching component, we assign a confidence score for a package if it is mentioned by a bug report. In the ensemble learning component, we divide the training dataset into n subsets and build a sub-classifier on each subset. Then we automatically determine an appropriate weight for each sub-classifier and combine them to predict the confidence score of a package being affected by a new bug report. Finally, PkgRec combines the name matching component and the ensemble learning component to assign a final confidence score to each potential package. A list of top-k packages with the highest confidence scores would then be recommended. We evaluate PkgRec on 3 datasets including Ubuntu, OpenStack, and GNOME with a total number of 42,094 bug reports. We show that PkgRec could achieve recall@5 and recall@10 scores of 0.511-0.737, and 0.614-0.785, respectively. We also compare PkgRec with other state-of-art approaches, namely LDA-KL and MLkNN. The experiment results show that PkgRec on average improves recall@5 and recall@10 scores of LDA-KL by 47% and 31%, and MLkNN by 52% and 37%, respectively.


Chris Bogdiukiewicz, Michael Butler, Thai Son Hoang, Martin Paxton, James Snook, Xanthippe Waldron and Toby Wilkinson.

Formal Development of Policing Functions for Intelligent Systems

Abstract: We present an approach for ensuring safety properties of autonomous systems. Our contribution is a system architecture where a policing function validating system safety properties at runtime is separated from the system's intelligent planning function. The policing function is developed formally by a correct-by-construction method. The separation of concerns enables the possibility of replacing and adapting the intelligent planning function without changing the validation approach. We validate our approach on the example of a multi-UAV system managing route generation. Our prototype runtime validator has been integrated and evaluated with an industrial UAV synthetic environment.


Yukasa Murakami, Masateru Tsunoda and Hidetake Uwano.

(WAP) Does Reviewers’ Age Affect the Performance of Code Review?

Abstract: We focus on the developers’ performance of code review, and analyze whether subjects’ age affects efficiency and preciseness of code review or not. Generally, when the age is high, the experiment of coding is also abundant. So, the age is considered to affect the code review positively. However, in our past study, code understanding speed is relatively slow, when the age of subjects is high, and the memory is needed to understand the program. Similarly, on the code review, subjects’ age may affect the efficiency (e.g., the number of indications per unit time). In the experiment, subjects reviewed source code, referring mini specification documents, and when the code did not follow the document, the subjects indicated the point. We classified subjects into the senior group and the younger group. In the analysis, we stratified the results based on the age, and used correlation coefficients and multiple linear regression, to clarify the relationship between the age and review performance. As a result, the age does not affect the efficiency and correctness of code review. Also, the experience of software development of subjects does not relate to the performance very much.