-
This work presents some characteristics of MoNet, a digital platform for the
modeling and visualization of complex systems. Emphasis is on the ideas that
allowed the successful progressive development of this modeling platform, which
goes along with the implementation of applications to the modeling of several
studied systems. The platform can represent different aspects of systems
modeled at different observation scales. This tool offers advantages in the
sense of favoring the perception of the phenomenon of the emergence of
information, associated with changes of scale. This paper also includes some
criteria used for the construction of this modeling platform. The power of
current computers has made practical representing graphic resources such as
shapes, line thickness, overlaying-text tags, colors, and transparencies, in
the graphical modeling of systems. By visualizing diagrams conveniently
designed to highlight contrasts, these modeling platforms allow the recognition
of patterns that drive our understanding of systems and their structure. Graphs
reflecting the benefits of the tool regarding the visualization of systems at
different scales of observation are presented to illustrate the application of
the platform.
-
Ubiquitous technology platforms have been created to track and improve health
and fitness; similar technologies can help individuals monitor and reduce their
carbon footprints. This paper proposes CarbonKit, a platform combining
technology, markets, and incentives to empower and reward people for reducing
their carbon footprint. We argue that a goal-and-reward behavioral feedback
loop can be combined with the Big Data available from tracked activities, apps,
and social media to make CarbonKit an integral part of individuals daily lives.
CarbonKit comprises five modules that link personal carbon tracking, health and
fitness, social media, and economic incentives. Protocols for safeguarding
security, privacy and individuals control over their own data are essential to
the design of the CarbonKit. Initially CarbonKit would operate on a voluntary
basis, but such a system can also serve as part of a mandatory region-wide
initiative. We use the example of the British Columbia to illustrate the
regulatory framework and participating stakeholders that would be required to
support the CarbonKit in specific jurisdictions.
-
We propose an input/output conformance testing theory utilizing Modal
Interface Automata with Input Refusals (IR-MIA) as novel behavioral formalism
for both the specification and the implementation under test. A modal
refinement relation on IR-MIA allows distinguishing between obligatory and
allowed output behaviors, as well as between implicitly underspecified and
explicitly forbidden input behaviors. The theory therefore supports positive
and negative conformance testing with optimistic and pessimistic environmental
assumptions. We further show that the resulting conformance relation on IR-MIA,
called modal-irioco, enjoys many desirable properties concerning
component-based behaviors. First, modal-irioco is preserved under modal
refinement and constitutes a preorder under certain restrictions which can be
ensured by a canonical input completion for IR-MIA. Second, under the same
restrictions, modal-irioco is compositional with respect to parallel
composition of IR-MIA with multi-cast and hiding. Finally, the quotient
operator on IR-MIA, as the inverse to parallel composition, facilitates
decompositionality in conformance testing to solve the unknown-component
problem.
-
The considerable effort of writing requirements is only worthwhile if the
result meets two conditions: the requirements reflect stakeholders' needs, and
the implementation satisfies them. In usual approaches, the use of different
notations for requirements (often natural language) and implementations (a
programming language) makes both conditions elusive. AutoReq, presented in this
article, takes a different approach to both the writing of requirements and
their verification. Applying the approach to a well-documented example, a
landing gear system, allowed for a mechanical proof of consistency and
uncovered an error in a published discussion of the problem.
-
To ensure fulfilling stakeholder wishes, it is crucial to validate the
documented requirements. This is often complicated by the fact that the wishes
and intentions of different stakeholders are somewhat contradictory, which
manifests itself in inconsistent requirements. To aid requirements engineers in
identifying and resolving inconsistent requirements, we investigated the
usefulness for manual reviews of two different model-based representation
formats for inconsistent requirements; one that represent the inconsistent
requirements in separate diagrams and one that represents them integrated into
one diagram using annotations. The results from a controlled experiment show
that the use of such integrated review diagrams can significantly increase
efficiency of manual reviews, without sacrificing effectiveness.
-
In the field of automated program repair, the redundancy assumption claims
large programs contain the seeds of their own repair. However, most
redundancy-based program repair techniques do not reason about the repair
ingredients---the code that is reused to craft a patch. We aim to reason about
the repair ingredients by using code similarities to prioritize and transform
statements in a codebase for patch generation. Our approach, DeepRepair, relies
on deep learning to reason about code similarities. Code fragments at
well-defined levels of granularity in a codebase can be sorted according to
their similarity to suspicious elements (i.e., code elements that contain
suspicious statements) and statements can be transformed by mapping
out-of-scope identifiers to similar identifiers in scope. We examined these new
search strategies for patch generation with respect to effectiveness from the
viewpoint of a software maintainer. Our comparative experiments were executed
on six open-source Java projects including 374 buggy program revisions and
consisted of 19,949 trials spanning 2,616 days of computation time.
DeepRepair's search strategy using code similarities generally found compilable
ingredients faster than the baseline, jGenProg, but this improvement neither
yielded test-adequate patches in fewer attempts (on average) nor found
significantly more patches than the baseline. Although the patch counts were
not statistically different, there were notable differences between the nature
of DeepRepair patches and baseline patches. The results demonstrate that our
learning-based approach finds patches that cannot be found by existing
redundancy-based repair techniques.
-
Reliability prediction is an important task in software reliability
engineering, which has been widely studied in the last decades. However,
modelling and predicting user-perceived reliability of black-box services
remain an open research problem. Software services, such as Web services and
Web APIs, generally provide black-box functionalities to users through the
Internet, thus leading to a lack of their internal information for reliability
analysis. Furthermore, the user-perceived service reliability depends not only
on the service itself, but also heavily on the invocation context (e.g.,
service workloads, network conditions), whereby traditional reliability models
become ineffective and inappropriate. To address these new challenges posed by
blackbox services, in this paper, we propose CARP, a new contextaware
reliability prediction approach, which leverages historical usage data from
users to construct context-aware reliability models and further provides online
reliability prediction results to users. Through context-aware reliability
modelling, CARP is able to alleviate the data sparsity problem that heavily
limits the prediction accuracy of other existing approaches. The preliminary
evaluation results show that CARP can make a significant improvement in
reliability prediction accuracy, e.g., about 41% in MAE and 38% in RMSE when
only 5% of the data are available.
-
A few works address the challenge of automating software diversification, and
they all share one core idea: using automated test suites to drive
diversification. However, there is is lack of solid understanding of how test
suites, programs and transformations interact one with another in this process.
We explore this intricate interplay in the context of a specific
diversification technique called "sosiefication". Sosiefication generates sosie
programs, i.e., variants of a program in which some statements are deleted,
added or replaced but still pass the test suite of the original program. Our
investigation of the influence of test suites on sosiefication exploits the
following observation: test suites cover the different regions of programs in
very unequal ways. Hence, we hypothesize that sosie synthesis has different
performances on a statement that is covered by one hundred test case and on a
statement that is covered by a single test case. We synthesize 24583 sosies on
6 popular open-source Java programs. Our results show that there are two
dimensions for diversification. The first one lies in the specification: the
more test cases cover a statement, the more difficult it is to synthesize
sosies. Yet, to our surprise, we are also able to synthesize sosies on highly
tested statements (up to 600 test cases), which indicates an intrinsic property
of the programs we study. The second dimension is in the code: we manually
explore dozens of sosies and characterize new types of forgiving code regions
that are prone to diversification.
-
Cyber-physical systems (CPS), such as automotive systems, are starting to
include sophisticated machine learning (ML) components. Their correctness,
therefore, depends on properties of the inner ML modules. While learning
algorithms aim to generalize from examples, they are only as good as the
examples provided, and recent efforts have shown that they can produce
inconsistent output under small adversarial perturbations. This raises the
question: can the output from learning components can lead to a failure of the
entire CPS? In this work, we address this question by formulating it as a
problem of falsifying signal temporal logic (STL) specifications for CPS with
ML components. We propose a compositional falsification framework where a
temporal logic falsifier and a machine learning analyzer cooperate with the aim
of finding falsifying executions of the considered model. The efficacy of the
proposed technique is shown on an automatic emergency braking system model with
a perception component based on deep neural networks.
-
Precisely modeling complex systems like cyber-physical systems is
challenging, which often render model-based system verification techniques like
model checking infeasible. To overcome this challenge, we propose a method
called LAR to automatically `verify' such complex systems through a combination
of learning, abstraction and refinement from a set of system log traces. We
assume that log traces and sampling frequency are adequate to capture `enough'
behaviour of the system. Given a safety property and the concrete system log
traces as input, LAR automatically learns and refines system models, and
produces two kinds of outputs. One is a counterexample with a bounded
probability of being spurious. The other is a probabilistic model based on
which the given property is `verified'. The model can be viewed as a proof
obligation, i.e., the property is verified if the model is correct. It can also
be used for subsequent system analysis activities like runtime monitoring or
model-based testing. Our method has been implemented as a self-contained
software toolkit. The evaluation on multiple benchmark systems as well as a
real-world water treatment system shows promising results.
-
We present methods for k-means clustering on a stream with a focus on
providing fast responses to clustering queries. Compared to the current
state-of-the-art, our methods provide substantial improvement in the query time
for cluster centers while retaining the desirable properties of provably small
approximation error and low space usage. Our algorithms rely on a novel idea of
"coreset caching" that systematically reuses coresets (summaries of data)
computed for recent queries in answering the current clustering query. We
present both theoretical analysis and detailed experiments demonstrating their
correctness and efficiency
-
The field of machine programming (MP), the automation of the development of
software, is making notable research advances. This is, in part, due to the
emergence of a wide range of novel techniques in machine learning. In this
paper, we apply MP to the automation of software performance regression
testing. A performance regression is a software performance degradation caused
by a code change. We present AutoPerf - a novel approach to automate regression
testing that utilizes three core techniques: (i) zero-positive learning, (ii)
autoencoders, and (iii) hardware telemetry. We demonstrate AutoPerf's
generality and efficacy against 3 types of performance regressions across 10
real performance bugs in 7 benchmark and open-source programs. On average,
AutoPerf exhibits 4% profiling overhead and accurately diagnoses more
performance bugs than prior state-of-the-art approaches. Thus far, AutoPerf has
produced no false negatives.
-
System development often involves decisions about how a high-level design is
to be implemented using primitives from a low-level platform. Certain
decisions, however, may introduce undesirable behavior into the resulting
implementation, possibly leading to a violation of a desired property that has
already been established at the design level. In this paper, we introduce the
problem of synthesizing a property-preserving platform mapping: A set of
implementation decisions ensuring that a desired property is preserved from a
high-level design into a low-level platform implementation. We provide a
formalization of the synthesis problem and propose a technique for synthesizing
a mapping based on symbolic constraint search. We describe our prototype
implementation, and a real-world case study demonstrating the application of
our technique to synthesizing secure mappings for the popular web authorization
protocols OAuth 1.0 and 2.0.
-
Building on concepts drawn from control theory, self-adaptive software
handles environmental and internal uncertainties by dynamically adjusting its
architecture and parameters in response to events such as workload changes and
component failures. Self-adaptive software is increasingly expected to meet
strict functional and non-functional requirements in applications from areas as
diverse as manufacturing, healthcare and finance. To address this need, we
introduce a methodology for the systematic ENgineering of TRUstworthy
Self-adaptive sofTware (ENTRUST). ENTRUST uses a combination of (1) design-time
and runtime modelling and verification, and (2) industry-adopted assurance
processes to develop trustworthy self-adaptive software and assurance cases
arguing the suitability of the software for its intended application. To
evaluate the effectiveness of our methodology, we present a tool-supported
instance of ENTRUST and its use to develop proof-of-concept self-adaptive
software for embedded and service-based systems from the oceanic monitoring and
e-finance domains, respectively. The experimental results show that ENTRUST can
be used to engineer self-adaptive software systems in different application
domains and to generate dynamic assurance cases for these systems.
-
Read-Copy Update (RCU) is a scalable, high-performance Linux-kernel
synchronization mechanism that runs low-overhead readers concurrently with
updaters. Production-quality RCU implementations for multi-core systems are
decidedly non-trivial. Giving the ubiquity of Linux, a rare "million-year" bug
can occur several times per day across the installed base. Stringent validation
of RCU's complex behaviors is thus critically important. Exhaustive testing is
infeasible due to the exponential number of possible executions, which suggests
use of formal verification.
Previous verification efforts on RCU either focus on simple implementations
or use modeling languages, the latter requiring error-prone manual translation
that must be repeated frequently due to regular changes in the Linux kernel's
RCU implementation. In this paper, we first describe the implementation of Tree
RCU in the Linux kernel. We then discuss how to construct a model directly from
Tree RCU's source code in C, and use the CBMC model checker to verify its
safety and liveness properties. To our best knowledge, this is the first
verification of a significant part of RCU's source code, and is an important
step towards integration of formal verification into the Linux kernel's
regression test suite.
-
Literature reviews are essential for any researcher trying to keep up to date
with the burgeoning software engineering literature. FAST$^2$ is a novel tool
for reducing the effort required for conducting literature reviews by assisting
the researchers to find the next promising paper to read (among a set of unread
papers). This paper describes FAST$^2$ and tests it on four large software
engineering literature reviews conducted by Wahono (2015), Hall (2012),
Radjenovi\'c (2013) and Kitchenham (2017). We find that FAST$^2$ is a faster
and robust tool to assist researcher finding relevant SE papers which can
compensate for the errors made by humans during the review process. The
effectiveness of FAST$^2$ can be attributed to three key innovations: (1) a
novel way of applying external domain knowledge (a simple two or three keyword
search) to guide the initial selection of papers---which helps to find relevant
research papers faster with less variances; (2) an estimator of the number of
remaining relevant papers yet to be found---which in practical settings can be
used to decide if the reviewing process needs to be terminated; (3) a novel
self-correcting classification algorithm---automatically corrects itself, in
cases where the researcher wrongly classifies a paper.
-
One source of software project challenges and failures is the systematic
errors introduced by human cognitive biases. Although extensively explored in
cognitive psychology, investigations concerning cognitive biases have only
recently gained popularity in software engineering (SE) research. This paper
therefore systematically maps, aggregates and synthesizes the literature on
cognitive biases in software engineering to generate a comprehensive body of
knowledge, understand state of the art research and provide guidelines for
future research and practise. Focusing on bias antecedents, effects and
mitigation techniques, we identified 65 articles, which investigate 37
cognitive biases, published between 1990 and 2016. Despite strong and
increasing interest, the results reveal a scarcity of research on mitigation
techniques and poor theoretical foundations in understanding and interpreting
cognitive biases. Although bias-related research has generated many new
insights in the software engineering community, specific bias mitigation
techniques are still needed for software professionals to overcome the
deleterious effects of cognitive biases on their work.
-
Software engineering research is evolving and papers are increasingly based
on empirical data from a multitude of sources, using statistical tests to
determine if and to what degree empirical evidence supports their hypotheses.
To investigate the practices and trends of statistical analysis in empirical
software engineering (ESE), this paper presents a review of a large pool of
papers from top-ranked software engineering journals. First, we manually
reviewed 161 papers and in the second phase of our method, we conducted a more
extensive semi-automatic classification of papers spanning the years 2001--2015
and 5,196 papers. Results from both review steps was used to: i) identify and
analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well
as relevant trends in usage of specific statistical methods (e.g.,
nonparametric tests and effect size measures) and, ii) develop a conceptual
model for a statistical analysis workflow with suggestions on how to apply
different statistical methods as well as guidelines to avoid pitfalls. Lastly,
we confirm existing claims that current ESE practices lack a standard to report
practical significance of results. We illustrate how practical significance can
be discussed in terms of both the statistical analysis and in the
practitioner's context.
-
Good parameter settings are crucial to achieve high performance in many areas
of artificial intelligence (AI), such as propositional satisfiability solving,
AI planning, scheduling, and machine learning (in particular deep learning).
Automated algorithm configuration methods have recently received much attention
in the AI community since they replace tedious, irreproducible and error-prone
manual parameter tuning and can lead to new state-of-the-art performance.
However, practical applications of algorithm configuration are prone to several
(often subtle) pitfalls in the experimental design that can render the
procedure ineffective. We identify several common issues and propose best
practices for avoiding them. As one possibility for automatically handling as
many of these as possible, we also propose a tool called GenericWrapper4AC.
-
A Webview embeds a full-fledged browser in a mobile application and allows
the application to expose a custom interface to JavaScript code. This is a
popular technique to build so-called hybrid applications, but it circumvents
the usual security model of the browser: any malicious JavaScript code injected
into the Webview gains access to the interface and can use it to manipulate the
device or exfiltrate sensitive data. In this paper, we present an approach to
systematically evaluate the possible impact of code injection attacks against
Webviews using static information flow analysis. Our key idea is that we can
make reasoning about JavaScript semantics unnecessary by instrumenting the
application with a model of possible attacker behavior -- the BabelView. We
evaluate our approach on 11,648 apps from various Android marketplaces, finding
2,677 vulnerabilities in 1,663 apps. Taken together, the apps reported as
vulnerable have over 835 million installations worldwide. We manually validated
a random sample of 66 apps and estimate that our fully automated analysis
achieves a precision of 90% at a recall of 66%.
-
This paper explores the structure of research papers in software engineering.
Using text mining, we study 35,391 software engineering (SE) papers from 34
leading SE venues over the last 25 years. These venues were divided, nearly
evenly, between conferences and journals. An important aspect of this analysis
is that it is fully automated and repeatable. To achieve that automation, we
used a stable topic modeling technique called LDADE that fully automates
parameter tuning in LDA. Using LDADE, we mine 11 topics that represent much of
the structure of contemporary SE. The 11 topics presented here should not be
"set in stone" as the only topics worthy of study in SE. Rather our goal is to
report that (a) text mining methods can detect large scale trends within our
community; (b) those topic change with time; so (c) it is important to have
automatic agents that can update our understanding of our community whenever
new data arrives.
-
The amount of aspect-oriented software development techniques and tools have
been increasing for the last years but still they have not enough maturity and
are not sufficiently spread to be included in a project leader's box of tools.
Software development projects have to deal with many risks, and the main
function of project leaders is to minimize them. The use of immature
technologies, tools newcomers to the market, techniques that have not been
tested enough, etc., would be very risky decisions to take by who has the
responsibility to carry out a successful software development project. On the
other hand, the availability of well-known tools and techniques and the
adherence to standards and best practices will help professionals to make good
estimates and to take better decisions. There is many proposals on
aspect-oriented techniques, notations, tools, etc., but they have not yet
unified on a common body of knowledge and none of them has become the most
popular approach. We are interested in the early phases of the aspect-oriented
SDLC which include the phases from the beginning of the life cycle until
architecture design. We are interested in portraying the state of the art of
aspect-oriented techniques and tools, in the identification of the standards
they employ. Our goal is to collect all the available evidence, analyze it, and
study the possibility of applying these techniques, tools and standards in real
projects, taking advantage of the benefits of the aspect-oriented paradigm, by
identifying the standard and widespread approaches, techniques, notations and
tools present in the scientific literature and to verify if they were applied
in the industry, by means of a systematic mapping of the literature published
between 1995 and 2019.
-
Context: A Multivocal Literature Review (MLR) is a form of a Systematic
Literature Review (SLR) which includes the grey literature (e.g., blog posts
and white papers) in addition to the published (formal) literature (e.g.,
journal and conference papers). MLRs are useful for both researchers and
practitioners since they provide summaries both the state-of-the art and
-practice in a given area. Objective: There are several guidelines to conduct
SLR studies in SE. However, given the facts that several phases of MLRs differ
from those of traditional SLRs, for instance with respect to the search process
and source quality assessment. Therefore, SLR guidelines are only partially
useful for conducting MLR studies. Our goal in this paper is to present
guidelines on how to conduct MLR studies in SE. Method: To develop the MLR
guidelines, we benefit from three inputs: (1) existing SLR guidelines in SE,
(2), a literature survey of MLR guidelines and experience papers in other
fields, and (3) our own experiences in conducting several MLRs in SE. All
derived guidelines are discussed in the context of three examples MLRs as
running examples (two from SE and one MLR from the medical sciences). Results:
The resulting guidelines cover all phases of conducting and reporting MLRs in
SE from the planning phase, over conducting the review to the final reporting
of the review. In particular, we believe that incorporating and adopting a vast
set of recommendations from MLR guidelines and experience papers in other
fields have enabled us to propose a set of guidelines with solid foundations.
Conclusion: Having been developed on the basis of three types of solid
experience and evidence, the provided MLR guidelines support researchers to
effectively and efficiently conduct new MLRs in any area of SE.
-
Due to the difficulty of repairing defect, many research efforts have been
devoted into automatic defect repair. Given a buggy program that fails some
test cases, a typical automatic repair technique tries to modify the program to
make all tests pass. However, since the test suites in real world projects are
usually insufficient, aiming at passing the test suites often leads to
incorrect patches.
In this paper we aim to produce precise patches, that is, any patch we
produce has a relatively high probability to be correct. More concretely, we
focus on condition synthesis, which was shown to be able to repair more than
half of the defects in existing approaches. Our key insight is threefold.
First, it is important to know what variables in a local context should be used
in an "if" condition, and we propose a sorting method based on the dependency
relations between variables. Second, we observe that the API document can be
used to guide the repair process, and propose document analysis technique to
further filter the variables. Third, it is important to know what predicates
should be performed on the set of variables, and we propose to mine a set of
frequently used predicates in similar contexts from existing projects.
We develop a novel program repair system, ACS, that could generate precise
conditions at faulty locations. Furthermore, given the generated conditions are
very precise, we can perform a repair operation that is previously deemed to be
too overfitting: directly returning the test oracle to repair the defect. Using
our approach, we successfully repaired 17 defects on four projects of
Defects4J, which is the largest number of fully automatically repaired defects
reported on the dataset so far. More importantly, the precision of our approach
in the evaluation is 73.9%, which is significantly higher than previous
approaches, which are usually less than 40%.
-
When learning to use an Application Programming Interface (API), programmers
need to understand the inputs and outputs (I/O) of the API functions. Current
documentation tools automatically document the static information of I/O, such
as parameter types and names. What is missing from these tools is dynamic
information, such as I/O examples---actual valid values of inputs that produce
certain outputs. In this paper, we demonstrate a prototype toolset we built to
generate I/O examples. Our tool logs I/O values when API functions are
executed, for example in running test suites. Then, the tool puts I/O values
into API documents as I/O examples. Our tool has three programs: 1) funcWatch,
which collects I/O values when API developers run test suites, 2) ioSelect,
which selects one I/O example from a set of I/O values, and 3) ioPresent, which
embeds the I/O examples into documents. In a preliminary evaluation, we used
our tool to generate four hundred I/O examples for three C libraries: ffmpeg,
libssh, and protobuf-c.