• This work presents some characteristics of MoNet, a digital platform for the modeling and visualization of complex systems. Emphasis is on the ideas that allowed the successful progressive development of this modeling platform, which goes along with the implementation of applications to the modeling of several studied systems. The platform can represent different aspects of systems modeled at different observation scales. This tool offers advantages in the sense of favoring the perception of the phenomenon of the emergence of information, associated with changes of scale. This paper also includes some criteria used for the construction of this modeling platform. The power of current computers has made practical representing graphic resources such as shapes, line thickness, overlaying-text tags, colors, and transparencies, in the graphical modeling of systems. By visualizing diagrams conveniently designed to highlight contrasts, these modeling platforms allow the recognition of patterns that drive our understanding of systems and their structure. Graphs reflecting the benefits of the tool regarding the visualization of systems at different scales of observation are presented to illustrate the application of the platform.
  • Ubiquitous technology platforms have been created to track and improve health and fitness; similar technologies can help individuals monitor and reduce their carbon footprints. This paper proposes CarbonKit, a platform combining technology, markets, and incentives to empower and reward people for reducing their carbon footprint. We argue that a goal-and-reward behavioral feedback loop can be combined with the Big Data available from tracked activities, apps, and social media to make CarbonKit an integral part of individuals daily lives. CarbonKit comprises five modules that link personal carbon tracking, health and fitness, social media, and economic incentives. Protocols for safeguarding security, privacy and individuals control over their own data are essential to the design of the CarbonKit. Initially CarbonKit would operate on a voluntary basis, but such a system can also serve as part of a mandatory region-wide initiative. We use the example of the British Columbia to illustrate the regulatory framework and participating stakeholders that would be required to support the CarbonKit in specific jurisdictions.
  • We propose an input/output conformance testing theory utilizing Modal Interface Automata with Input Refusals (IR-MIA) as novel behavioral formalism for both the specification and the implementation under test. A modal refinement relation on IR-MIA allows distinguishing between obligatory and allowed output behaviors, as well as between implicitly underspecified and explicitly forbidden input behaviors. The theory therefore supports positive and negative conformance testing with optimistic and pessimistic environmental assumptions. We further show that the resulting conformance relation on IR-MIA, called modal-irioco, enjoys many desirable properties concerning component-based behaviors. First, modal-irioco is preserved under modal refinement and constitutes a preorder under certain restrictions which can be ensured by a canonical input completion for IR-MIA. Second, under the same restrictions, modal-irioco is compositional with respect to parallel composition of IR-MIA with multi-cast and hiding. Finally, the quotient operator on IR-MIA, as the inverse to parallel composition, facilitates decompositionality in conformance testing to solve the unknown-component problem.
  • The considerable effort of writing requirements is only worthwhile if the result meets two conditions: the requirements reflect stakeholders' needs, and the implementation satisfies them. In usual approaches, the use of different notations for requirements (often natural language) and implementations (a programming language) makes both conditions elusive. AutoReq, presented in this article, takes a different approach to both the writing of requirements and their verification. Applying the approach to a well-documented example, a landing gear system, allowed for a mechanical proof of consistency and uncovered an error in a published discussion of the problem.
  • To ensure fulfilling stakeholder wishes, it is crucial to validate the documented requirements. This is often complicated by the fact that the wishes and intentions of different stakeholders are somewhat contradictory, which manifests itself in inconsistent requirements. To aid requirements engineers in identifying and resolving inconsistent requirements, we investigated the usefulness for manual reviews of two different model-based representation formats for inconsistent requirements; one that represent the inconsistent requirements in separate diagrams and one that represents them integrated into one diagram using annotations. The results from a controlled experiment show that the use of such integrated review diagrams can significantly increase efficiency of manual reviews, without sacrificing effectiveness.
  • In the field of automated program repair, the redundancy assumption claims large programs contain the seeds of their own repair. However, most redundancy-based program repair techniques do not reason about the repair ingredients---the code that is reused to craft a patch. We aim to reason about the repair ingredients by using code similarities to prioritize and transform statements in a codebase for patch generation. Our approach, DeepRepair, relies on deep learning to reason about code similarities. Code fragments at well-defined levels of granularity in a codebase can be sorted according to their similarity to suspicious elements (i.e., code elements that contain suspicious statements) and statements can be transformed by mapping out-of-scope identifiers to similar identifiers in scope. We examined these new search strategies for patch generation with respect to effectiveness from the viewpoint of a software maintainer. Our comparative experiments were executed on six open-source Java projects including 374 buggy program revisions and consisted of 19,949 trials spanning 2,616 days of computation time. DeepRepair's search strategy using code similarities generally found compilable ingredients faster than the baseline, jGenProg, but this improvement neither yielded test-adequate patches in fewer attempts (on average) nor found significantly more patches than the baseline. Although the patch counts were not statistically different, there were notable differences between the nature of DeepRepair patches and baseline patches. The results demonstrate that our learning-based approach finds patches that cannot be found by existing redundancy-based repair techniques.
  • Reliability prediction is an important task in software reliability engineering, which has been widely studied in the last decades. However, modelling and predicting user-perceived reliability of black-box services remain an open research problem. Software services, such as Web services and Web APIs, generally provide black-box functionalities to users through the Internet, thus leading to a lack of their internal information for reliability analysis. Furthermore, the user-perceived service reliability depends not only on the service itself, but also heavily on the invocation context (e.g., service workloads, network conditions), whereby traditional reliability models become ineffective and inappropriate. To address these new challenges posed by blackbox services, in this paper, we propose CARP, a new contextaware reliability prediction approach, which leverages historical usage data from users to construct context-aware reliability models and further provides online reliability prediction results to users. Through context-aware reliability modelling, CARP is able to alleviate the data sparsity problem that heavily limits the prediction accuracy of other existing approaches. The preliminary evaluation results show that CARP can make a significant improvement in reliability prediction accuracy, e.g., about 41% in MAE and 38% in RMSE when only 5% of the data are available.
  • A few works address the challenge of automating software diversification, and they all share one core idea: using automated test suites to drive diversification. However, there is is lack of solid understanding of how test suites, programs and transformations interact one with another in this process. We explore this intricate interplay in the context of a specific diversification technique called "sosiefication". Sosiefication generates sosie programs, i.e., variants of a program in which some statements are deleted, added or replaced but still pass the test suite of the original program. Our investigation of the influence of test suites on sosiefication exploits the following observation: test suites cover the different regions of programs in very unequal ways. Hence, we hypothesize that sosie synthesis has different performances on a statement that is covered by one hundred test case and on a statement that is covered by a single test case. We synthesize 24583 sosies on 6 popular open-source Java programs. Our results show that there are two dimensions for diversification. The first one lies in the specification: the more test cases cover a statement, the more difficult it is to synthesize sosies. Yet, to our surprise, we are also able to synthesize sosies on highly tested statements (up to 600 test cases), which indicates an intrinsic property of the programs we study. The second dimension is in the code: we manually explore dozens of sosies and characterize new types of forgiving code regions that are prone to diversification.
  • Cyber-physical systems (CPS), such as automotive systems, are starting to include sophisticated machine learning (ML) components. Their correctness, therefore, depends on properties of the inner ML modules. While learning algorithms aim to generalize from examples, they are only as good as the examples provided, and recent efforts have shown that they can produce inconsistent output under small adversarial perturbations. This raises the question: can the output from learning components can lead to a failure of the entire CPS? In this work, we address this question by formulating it as a problem of falsifying signal temporal logic (STL) specifications for CPS with ML components. We propose a compositional falsification framework where a temporal logic falsifier and a machine learning analyzer cooperate with the aim of finding falsifying executions of the considered model. The efficacy of the proposed technique is shown on an automatic emergency braking system model with a perception component based on deep neural networks.
  • Precisely modeling complex systems like cyber-physical systems is challenging, which often render model-based system verification techniques like model checking infeasible. To overcome this challenge, we propose a method called LAR to automatically `verify' such complex systems through a combination of learning, abstraction and refinement from a set of system log traces. We assume that log traces and sampling frequency are adequate to capture `enough' behaviour of the system. Given a safety property and the concrete system log traces as input, LAR automatically learns and refines system models, and produces two kinds of outputs. One is a counterexample with a bounded probability of being spurious. The other is a probabilistic model based on which the given property is `verified'. The model can be viewed as a proof obligation, i.e., the property is verified if the model is correct. It can also be used for subsequent system analysis activities like runtime monitoring or model-based testing. Our method has been implemented as a self-contained software toolkit. The evaluation on multiple benchmark systems as well as a real-world water treatment system shows promising results.
  • We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. Compared to the current state-of-the-art, our methods provide substantial improvement in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our algorithms rely on a novel idea of "coreset caching" that systematically reuses coresets (summaries of data) computed for recent queries in answering the current clustering query. We present both theoretical analysis and detailed experiments demonstrating their correctness and efficiency
  • The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We present AutoPerf - a novel approach to automate regression testing that utilizes three core techniques: (i) zero-positive learning, (ii) autoencoders, and (iii) hardware telemetry. We demonstrate AutoPerf's generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs. On average, AutoPerf exhibits 4% profiling overhead and accurately diagnoses more performance bugs than prior state-of-the-art approaches. Thus far, AutoPerf has produced no false negatives.
  • System development often involves decisions about how a high-level design is to be implemented using primitives from a low-level platform. Certain decisions, however, may introduce undesirable behavior into the resulting implementation, possibly leading to a violation of a desired property that has already been established at the design level. In this paper, we introduce the problem of synthesizing a property-preserving platform mapping: A set of implementation decisions ensuring that a desired property is preserved from a high-level design into a low-level platform implementation. We provide a formalization of the synthesis problem and propose a technique for synthesizing a mapping based on symbolic constraint search. We describe our prototype implementation, and a real-world case study demonstrating the application of our technique to synthesizing secure mappings for the popular web authorization protocols OAuth 1.0 and 2.0.
  • Building on concepts drawn from control theory, self-adaptive software handles environmental and internal uncertainties by dynamically adjusting its architecture and parameters in response to events such as workload changes and component failures. Self-adaptive software is increasingly expected to meet strict functional and non-functional requirements in applications from areas as diverse as manufacturing, healthcare and finance. To address this need, we introduce a methodology for the systematic ENgineering of TRUstworthy Self-adaptive sofTware (ENTRUST). ENTRUST uses a combination of (1) design-time and runtime modelling and verification, and (2) industry-adopted assurance processes to develop trustworthy self-adaptive software and assurance cases arguing the suitability of the software for its intended application. To evaluate the effectiveness of our methodology, we present a tool-supported instance of ENTRUST and its use to develop proof-of-concept self-adaptive software for embedded and service-based systems from the oceanic monitoring and e-finance domains, respectively. The experimental results show that ENTRUST can be used to engineer self-adaptive software systems in different application domains and to generate dynamic assurance cases for these systems.
  • Read-Copy Update (RCU) is a scalable, high-performance Linux-kernel synchronization mechanism that runs low-overhead readers concurrently with updaters. Production-quality RCU implementations for multi-core systems are decidedly non-trivial. Giving the ubiquity of Linux, a rare "million-year" bug can occur several times per day across the installed base. Stringent validation of RCU's complex behaviors is thus critically important. Exhaustive testing is infeasible due to the exponential number of possible executions, which suggests use of formal verification. Previous verification efforts on RCU either focus on simple implementations or use modeling languages, the latter requiring error-prone manual translation that must be repeated frequently due to regular changes in the Linux kernel's RCU implementation. In this paper, we first describe the implementation of Tree RCU in the Linux kernel. We then discuss how to construct a model directly from Tree RCU's source code in C, and use the CBMC model checker to verify its safety and liveness properties. To our best knowledge, this is the first verification of a significant part of RCU's source code, and is an important step towards integration of formal verification into the Linux kernel's regression test suite.
  • Literature reviews are essential for any researcher trying to keep up to date with the burgeoning software engineering literature. FAST$^2$ is a novel tool for reducing the effort required for conducting literature reviews by assisting the researchers to find the next promising paper to read (among a set of unread papers). This paper describes FAST$^2$ and tests it on four large software engineering literature reviews conducted by Wahono (2015), Hall (2012), Radjenovi\'c (2013) and Kitchenham (2017). We find that FAST$^2$ is a faster and robust tool to assist researcher finding relevant SE papers which can compensate for the errors made by humans during the review process. The effectiveness of FAST$^2$ can be attributed to three key innovations: (1) a novel way of applying external domain knowledge (a simple two or three keyword search) to guide the initial selection of papers---which helps to find relevant research papers faster with less variances; (2) an estimator of the number of remaining relevant papers yet to be found---which in practical settings can be used to decide if the reviewing process needs to be terminated; (3) a novel self-correcting classification algorithm---automatically corrects itself, in cases where the researcher wrongly classifies a paper.
  • One source of software project challenges and failures is the systematic errors introduced by human cognitive biases. Although extensively explored in cognitive psychology, investigations concerning cognitive biases have only recently gained popularity in software engineering (SE) research. This paper therefore systematically maps, aggregates and synthesizes the literature on cognitive biases in software engineering to generate a comprehensive body of knowledge, understand state of the art research and provide guidelines for future research and practise. Focusing on bias antecedents, effects and mitigation techniques, we identified 65 articles, which investigate 37 cognitive biases, published between 1990 and 2016. Despite strong and increasing interest, the results reveal a scarcity of research on mitigation techniques and poor theoretical foundations in understanding and interpreting cognitive biases. Although bias-related research has generated many new insights in the software engineering community, specific bias mitigation techniques are still needed for software professionals to overcome the deleterious effects of cognitive biases on their work.
  • Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.
  • Good parameter settings are crucial to achieve high performance in many areas of artificial intelligence (AI), such as propositional satisfiability solving, AI planning, scheduling, and machine learning (in particular deep learning). Automated algorithm configuration methods have recently received much attention in the AI community since they replace tedious, irreproducible and error-prone manual parameter tuning and can lead to new state-of-the-art performance. However, practical applications of algorithm configuration are prone to several (often subtle) pitfalls in the experimental design that can render the procedure ineffective. We identify several common issues and propose best practices for avoiding them. As one possibility for automatically handling as many of these as possible, we also propose a tool called GenericWrapper4AC.
  • A Webview embeds a full-fledged browser in a mobile application and allows the application to expose a custom interface to JavaScript code. This is a popular technique to build so-called hybrid applications, but it circumvents the usual security model of the browser: any malicious JavaScript code injected into the Webview gains access to the interface and can use it to manipulate the device or exfiltrate sensitive data. In this paper, we present an approach to systematically evaluate the possible impact of code injection attacks against Webviews using static information flow analysis. Our key idea is that we can make reasoning about JavaScript semantics unnecessary by instrumenting the application with a model of possible attacker behavior -- the BabelView. We evaluate our approach on 11,648 apps from various Android marketplaces, finding 2,677 vulnerabilities in 1,663 apps. Taken together, the apps reported as vulnerable have over 835 million installations worldwide. We manually validated a random sample of 66 apps and estimate that our fully automated analysis achieves a precision of 90% at a recall of 66%.
  • This paper explores the structure of research papers in software engineering. Using text mining, we study 35,391 software engineering (SE) papers from 34 leading SE venues over the last 25 years. These venues were divided, nearly evenly, between conferences and journals. An important aspect of this analysis is that it is fully automated and repeatable. To achieve that automation, we used a stable topic modeling technique called LDADE that fully automates parameter tuning in LDA. Using LDADE, we mine 11 topics that represent much of the structure of contemporary SE. The 11 topics presented here should not be "set in stone" as the only topics worthy of study in SE. Rather our goal is to report that (a) text mining methods can detect large scale trends within our community; (b) those topic change with time; so (c) it is important to have automatic agents that can update our understanding of our community whenever new data arrives.
  • The amount of aspect-oriented software development techniques and tools have been increasing for the last years but still they have not enough maturity and are not sufficiently spread to be included in a project leader's box of tools. Software development projects have to deal with many risks, and the main function of project leaders is to minimize them. The use of immature technologies, tools newcomers to the market, techniques that have not been tested enough, etc., would be very risky decisions to take by who has the responsibility to carry out a successful software development project. On the other hand, the availability of well-known tools and techniques and the adherence to standards and best practices will help professionals to make good estimates and to take better decisions. There is many proposals on aspect-oriented techniques, notations, tools, etc., but they have not yet unified on a common body of knowledge and none of them has become the most popular approach. We are interested in the early phases of the aspect-oriented SDLC which include the phases from the beginning of the life cycle until architecture design. We are interested in portraying the state of the art of aspect-oriented techniques and tools, in the identification of the standards they employ. Our goal is to collect all the available evidence, analyze it, and study the possibility of applying these techniques, tools and standards in real projects, taking advantage of the benefits of the aspect-oriented paradigm, by identifying the standard and widespread approaches, techniques, notations and tools present in the scientific literature and to verify if they were applied in the industry, by means of a systematic mapping of the literature published between 1995 and 2019.
  • Context: A Multivocal Literature Review (MLR) is a form of a Systematic Literature Review (SLR) which includes the grey literature (e.g., blog posts and white papers) in addition to the published (formal) literature (e.g., journal and conference papers). MLRs are useful for both researchers and practitioners since they provide summaries both the state-of-the art and -practice in a given area. Objective: There are several guidelines to conduct SLR studies in SE. However, given the facts that several phases of MLRs differ from those of traditional SLRs, for instance with respect to the search process and source quality assessment. Therefore, SLR guidelines are only partially useful for conducting MLR studies. Our goal in this paper is to present guidelines on how to conduct MLR studies in SE. Method: To develop the MLR guidelines, we benefit from three inputs: (1) existing SLR guidelines in SE, (2), a literature survey of MLR guidelines and experience papers in other fields, and (3) our own experiences in conducting several MLRs in SE. All derived guidelines are discussed in the context of three examples MLRs as running examples (two from SE and one MLR from the medical sciences). Results: The resulting guidelines cover all phases of conducting and reporting MLRs in SE from the planning phase, over conducting the review to the final reporting of the review. In particular, we believe that incorporating and adopting a vast set of recommendations from MLR guidelines and experience papers in other fields have enabled us to propose a set of guidelines with solid foundations. Conclusion: Having been developed on the basis of three types of solid experience and evidence, the provided MLR guidelines support researchers to effectively and efficiently conduct new MLRs in any area of SE.
  • Due to the difficulty of repairing defect, many research efforts have been devoted into automatic defect repair. Given a buggy program that fails some test cases, a typical automatic repair technique tries to modify the program to make all tests pass. However, since the test suites in real world projects are usually insufficient, aiming at passing the test suites often leads to incorrect patches. In this paper we aim to produce precise patches, that is, any patch we produce has a relatively high probability to be correct. More concretely, we focus on condition synthesis, which was shown to be able to repair more than half of the defects in existing approaches. Our key insight is threefold. First, it is important to know what variables in a local context should be used in an "if" condition, and we propose a sorting method based on the dependency relations between variables. Second, we observe that the API document can be used to guide the repair process, and propose document analysis technique to further filter the variables. Third, it is important to know what predicates should be performed on the set of variables, and we propose to mine a set of frequently used predicates in similar contexts from existing projects. We develop a novel program repair system, ACS, that could generate precise conditions at faulty locations. Furthermore, given the generated conditions are very precise, we can perform a repair operation that is previously deemed to be too overfitting: directly returning the test oracle to repair the defect. Using our approach, we successfully repaired 17 defects on four projects of Defects4J, which is the largest number of fully automatically repaired defects reported on the dataset so far. More importantly, the precision of our approach in the evaluation is 73.9%, which is significantly higher than previous approaches, which are usually less than 40%.
  • When learning to use an Application Programming Interface (API), programmers need to understand the inputs and outputs (I/O) of the API functions. Current documentation tools automatically document the static information of I/O, such as parameter types and names. What is missing from these tools is dynamic information, such as I/O examples---actual valid values of inputs that produce certain outputs. In this paper, we demonstrate a prototype toolset we built to generate I/O examples. Our tool logs I/O values when API functions are executed, for example in running test suites. Then, the tool puts I/O values into API documents as I/O examples. Our tool has three programs: 1) funcWatch, which collects I/O values when API developers run test suites, 2) ioSelect, which selects one I/O example from a set of I/O values, and 3) ioPresent, which embeds the I/O examples into documents. In a preliminary evaluation, we used our tool to generate four hundred I/O examples for three C libraries: ffmpeg, libssh, and protobuf-c.