• Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets. While existing approaches can test whether two properties are related, they often require unfeasibly large sample sizes in real data scenarios, and do not provide any insight into how or why the procedure reached its decision. Our approach, "Multiscale Graph Correlation" (MGC), is a dependence test that juxtaposes previously disparate data science techniques, including k-nearest neighbors, kernel methods (such as support vector machines), and multiscale analysis (such as wavelets). Other methods typically require double or triple the number samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships - spanning polynomial (linear, quadratic, cubic), trigonometric (sinusoidal, circular, ellipsoidal, spiral), geometric (square, diamond, W-shape), and other functions, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely provides a simple and elegant characterization of the potentially complex latent geometry underlying the relationship, providing insight while maintaining computational efficiency. In several real data applications, including brain imaging and cancer genetics, MGC is the only method that can both detect the presence of a dependency and provide specific guidance for the next experiment and/or analysis to conduct.
  • Trade-offs such as "how much testing is enough" are critical yet challenging project decisions in software engineering. Most existing approaches adopt risk-driven or value-based analysis to prioritize test cases and minimize test runs. However, none of these is applicable to the emerging crowd testing paradigm where task requesters typically have no control over online crowdworkers's dynamic behavior and uncertain performance. In current practice, deciding when to close a crowdtesting task is largely done by guesswork due to lack of decision support. This paper intends to fill this gap by introducing automated decision support for monitoring and determining appropriate time to close the crowdtesting tasks. First, this paper investigates the necessity and feasibility of close prediction of crowdtesting tasks based on industrial dataset. Then,it designs 8 methods for close prediction, based on various models including the bug trend, bug arrival model, capture-recapture model.Finally, the evaluation is conducted on 218 crowdtesting tasks from one of the largest crowdtesting platforms in China, and the results show that a median of 91% bugs can be detected with 49% saved cost.
  • Crowdtesting has grown to be an effective alter-native to traditional testing, especially in mobile apps. However,crowdtesting is hard to manage in nature. Given the complexity of mobile applications and unpredictability of distributed, parallel crowdtesting process, it is difficult to estimate (a) the remaining number of bugs as yet undetected or (b) the required cost to find those bugs. Experience-based decisions may result in ineffective crowdtesting process. This paper aims at exploring automated decision support to effectively manage crowdtesting process. The proposed ISENSE applies incremental sampling technique to process crowdtesting reports arriving in chronological order, organizes them into fixed-size groups as dynamic inputs, and predicts two test completion indicators in an incrementally manner. The two indicators are: 1)total number of bugs predicted with Capture-ReCapture (CRC)model, and 2) required test cost for achieving certain test objectives predicted with AutoRegressive Integrated Moving Average(ARIMA) model. We assess ISENSE using 46,434 reports of 218 crowdtesting tasks from one of the largest crowdtesting platforms in China. Its effectiveness is demonstrated through two applications for automating crowdtesting management, i.e. automation oftask closing decision, and semi-automation of task closing trade-off analysis. The results show that decision automation using ISENSE will provide managers with greater opportunities to achieve cost-effectiveness gains of crowdtesting. Specifically, a median of 100% bugs can be detected with 30% saved cost basedon the automated close prediction
  • Crowdtesting is effective especially when it comes to the feedback on GUI systems, or subjective opinions about features. Despite of this, we find crowdtesting reports are highly replicated, i.e., 82% of them are replicates of others. Hence automatically detecting replicate reports could help reduce triaging efforts. Most of the existing approaches mainly adopted textual information for replicate detection, and suffered from low accuracy because of the expression gap. Our observation on real industrial crowdtesting data found that when dealing with crowdtesting reports of GUI systems, the reports would accompanied with images, i.e., the screenshots of the app. We assume the screenshot to be valuable for replicate crowdtesting report detection because it reflects the real scenario of the failure and is not affected by the variety of natural languages. In this work, we propose a replicate detection approach, TSDetector, which combines information from the screenshots and the textual descriptions to detect replicate crowdtesting reports. We extract four types of features to characterize the screenshots and the textual descriptions, and design an algorithm to detect replicates based on four similarity scores derived from the four different features respectively. We investigate the effectiveness and advantage of TSDetector on 15 commercial projects with 4,172 reports from one of the Chinese largest crowdtesting platforms.Results show that TSDetector can outperform existing state-of-the-art approaches significantly. In addition, we also evaluate its usefulness using real-world case studies. The feedback from real-world testers demonstrates its practical value
  • Neuroscientists are now able to acquire data at staggering rates across spatiotemporal scales. However, our ability to capitalize on existing datasets, tools, and intellectual capacities is hampered by technical challenges. The key barriers to accelerating scientific discovery correspond to the FAIR data principles: findability, global access to data, software interoperability, and reproducibility/re-usability. We conducted a hackathon dedicated to making strides in those steps. This manuscript is a technical report summarizing these achievements, and we hope serves as an example of the effectiveness of focused, deliberate hackathons towards the advancement of our quickly-evolving field.
  • Let $A=\mathbb{C}[t_1^{\pm1},t_2^{\pm1}]$ be the algebra of Laurent polynomials in two variables and $B$ be the set of skew derivations of $A$. Let $L$ be the universal central extension of the derived Lie subalgebra of the Lie algebra $A\rtimes B$. Set $\widetilde{L}=L\oplus\mathbb{C} d_1\oplus\mathbb{C} d_2$, where $d_1$, $d_2$ are two degree derivations. A Harish-Chandra module is defined as an irreducible weight module with finite dimensional weight spaces. In this paper, we prove that a Harish-Chandra module of the Lie algebra $\widetilde{L}$ is a uniformly bounded module or a generalized highest weight (GHW for short) module. Furthermore, we prove that the nonzero level Harish-Chandra modules of $\widetilde{L}$ are GHW modules. Finally, we classify all the GHW Harish-Chandra modules of $\widetilde{L}$.
  • We develop a framework for deriving Dyson-Schwinger Equations (DSEs) and Bethe-Salpeter Equation (BSE) in QCD at large $N_c$ limit. The starting point is a modified form (with auxiliary fields) of QCD generating functional. This framework provides a natural order-by-order truncation scheme for DSEs and BSE, and the kernels of the equations up to any order are explicitly given. Chiral symmetry (at chiral limit) is preserved in any order truncation, so it exemplifies the symmetry preserving truncation scheme. It provides a method to study DSEs and BSE beyond the Rainbow-Ladder truncation, and is especially useful to study contributions from non-Abelian dynamics (those arise from gluon self-interactions). We also derive the equation for the quark-ghost scattering kernel, and discuss the Slavnov-Taylor identity connecting the quark-gluon vertex, the quark propagator and the quark-ghost scattering kernel.
  • Destination IP prefix-based routing protocols are core to Internet routing today. Internet autonomous systems (AS) possess fixed IP prefixes, while packets carry the intended destination AS's prefix in their headers, in clear text. As a result, network communications can be easily identified using IP addresses and become targets of a wide variety of attacks, such as DNS/IP filtering, distributed Denial-of-Service (DDoS) attacks, man-in-the-middle (MITM) attacks, etc. In this work, we explore an alternative network architecture that fundamentally removes such vulnerabilities by disassociating the relationship between IP prefixes and destination networks, and by allowing any end-to-end communication session to have dynamic, short-lived, and pseudo-random IP addresses drawn from a range of IP prefixes rather than one. The concept is seemingly impossible to realize in todays Internet. We demonstrate how this is doable today with three different strategies using software defined networking (SDN), and how this can be done at scale to transform the Internet addressing and routing paradigms with the novel concept of a distributed software defined Internet exchange (SDX). The solution works with both IPv4 and IPv6, whereas the latter provides higher degrees of IP addressing freedom. Prototypes based on OpenvSwitches (OVS) have been implemented for experimentation across the PEERING BGP testbed. The SDX solution not only provides a technically sustainable pathway towards large-scale traffic analysis resistant network (TARN) support, it also unveils a new business model for customer driven, customizable and trustable end-to-end network services.
  • We establish a natural connection of the $q$-Virasoro algebra $D_{q}$ introduced by Belov and Chaltikian with affine Kac-Moody Lie algebras. More specifically, for each abelian group $S$ together with a one-to-one linear character $\chi$, we define an infinite-dimensional Lie algebra $D_{S}$ which reduces to $D_{q}$ when $S=\mathbb{Z}$. Guided by the theory of equivariant quasi modules for vertex algebras, we introduce another Lie algebra ${\mathfrak{g}}_{S}$ with $S$ as an automorphism group and we prove that $D_{S}$ is isomorphic to the $S$-covariant algebra of the affine Lie algebra $\widehat{{\mathfrak{g}}_{S}}$. We then relate restricted $D_{S}$-modules of level $\ell\in \mathbb{C}$ to equivariant quasi modules for the vertex algebra $V_{\widehat{\mathfrak{g}_{S}}}(\ell,0)$ associated to $\widehat{{\mathfrak{g}}_{S}}$ with level $\ell$. Furthermore, we show that if $S$ is a finite abelian group of order $2l+1$, $D_{S}$ is isomorphic to the affine Kac-Moody algebra of type $B^{(1)}_{l}$.
  • Online interactive recommender systems strive to promptly suggest to consumers appropriate items (e.g., movies, news articles) according to the current context including both the consumer and item content information. However, such context information is often unavailable in practice for the recommendation, where only the users' interaction data on items can be utilized. Moreover, the lack of interaction records, especially for new users and items, worsens the performance of recommendation further. To address these issues, collaborative filtering (CF), one of the recommendation techniques relying on the interaction data only, as well as the online multi-armed bandit mechanisms, capable of achieving the balance between exploitation and exploration, are adopted in the online interactive recommendation settings, by assuming independent items (i.e., arms). Nonetheless, the assumption rarely holds in reality, since the real-world items tend to be correlated with each other (e.g., two articles with similar topics). In this paper, we study online interactive collaborative filtering problems by considering the dependencies among items. We explicitly formulate the item dependencies as the clusters on arms, where the arms within a single cluster share the similar latent topics. In light of the topic modeling techniques, we come up with a generative model to generate the items from their underlying topics. Furthermore, an efficient online algorithm based on particle learning is developed for inferring both latent parameters and states of our model. Additionally, our inferred model can be naturally integrated with existing multi-armed selection strategies in the online interactive collaborating setting. Empirical studies on two real-world applications, online recommendations of movies and news, demonstrate both the effectiveness and efficiency of the proposed approach.
  • Quantitative understanding of relationships between students' behavioral patterns and academic performances is a significant step towards personalized education. In contrast to previous studies that mainly based on questionnaire surveys, in this paper, we collect behavioral records from 18,960 undergraduate students' smart cards and propose a novel metric, called orderness, which measures the regularity of campus daily life (e.g., meals and showers) of each student. Empirical analysis demonstrates that academic performance (GPA) is strongly correlated with orderness. Furthermore, we show that orderness is an important feature to predict academic performance, which remarkably improves the prediction accuracy even at the presence of students' diligence. Based on these analyses, education administrators could better guide students' campus lives and implement effective interventions in an early stage when necessary.
  • We define a programming language independent controller TaCtl for multi-level transactions and an operator $TA$, which when applied to concurrent programs with multi-level shared locations containing hierarchically structured complex values, turns their behavior with respect to some abstract termination criterion into a transactional behavior. We prove the correctness property that concurrent runs under the transaction controller are serialisable, assuming an Inverse Operation Postulate to guarantee recoverability. For its applicability to a wide range of programs we specify the transaction controller TaCtl and the operator $TA$ in terms of Abstract State Machines (ASMs). This allows us to model concurrent updates at different levels of nested locations in a precise yet simple manner, namely in terms of partial ASM updates. It also provides the possibility to use the controller TaCtl and the operator $TA$ as a plug-in when specifying concurrent system components in terms of sequential ASMs.
  • In this paper, we associate quantum vertex algebras to a certain family of associative algebras $\widetilde{\A}(g)$ which are essentially Ding-Iohara algebras. To do this, we introduce another closely related family of associative algebras $\A(h)$. The associated quantum vertex algebras are based on the vacuum modules for $\A(h)$, whereas $\phi$-coordinated modules for these quantum vertex algebras are associated to $\widetilde{A}(g)$-modules. Furthermore, we classify their irreducible $\phi$-coordinated modules.
  • We develop a logic which enables reasoning about single steps of non-deterministic parallel Abstract State Machines (ASMs). Our logic builds upon the unifying logic introduced by Nanchen and St\"ark for reasoning about hierarchical (parallel) ASMs. Our main contribution to this regard is the handling of non-determinism (both bounded and unbounded) within the logical formalism. Moreover, we do this without sacrificing the completeness of the logic for statements about single steps of non-deterministic parallel ASMs, such as invariants of rules, consistency conditions for rules, or step-by-step equivalence of rules.
  • In database theory, the term $\textit{database transformation}$ was used to refer to a unifying treatment for computable queries and updates. Recently, it was shown that non-deterministic database transformations can be captured exactly by a variant of ASMs, the so-called Database Abstract State Machines (DB-ASMs). In this article we present a logic for DB-ASMs, extending the logic of Nanchen and St\"ark for ASMs. In particular, we develop a rigorous proof system for the logic for DB-ASMs, which is proven to be sound and complete. The most difficult challenge to be handled by the extension is a proper formalisation capturing non-determinism of database transformations and all its related features such as consistency, update sets or multisets associated with DB-ASM rules. As the database part of a state of database transformations is a finite structure and DB-ASMs are restricted by allowing quantifiers only over the database part of a state, we resolve this problem by taking update sets explicitly into the logic, i.e. by using an additional modal operator $[X]$, where $X$ is interpreted as an update set $\Delta$ generated by a DB-ASM rule. The DB-ASM logic provides a powerful verification tool to study properties of database transformations.
  • For more than a century, artificial lighting has served mainly for illumination. Only recently, we start to transform our lighting infrastructure to provide new services such as indoor localization and network connectivity. These innovative advancements rely on two key requirements: the ability to modulate light sources (for data transmission) and the presence of photodetectors on objects (for data reception). But not all lights can be modulated and most objects do not have photodetectors. To overcome these limitations, researchers are developing novel sensing and communication methods that exploit passive light sources, such as the sun, and that leverage the external surfaces of objects, such as fingers and car roofs, to create a new generation of cyber-physical systems based on visible light. In this article we propose a taxonomy to analyze these novel contributions. Our taxonomy allows us to identify the overarching principles, challenges and opportunities of this new rising area.
  • The influence of the noncommutativity on the average speed of a relativistic electron interacting with a uniform magnetic field within the minimum evolution time is investigated. We find that it is possible for the wave packet of the electron to travel faster than the speed of light in vacuum because of the noncommutativity. It suggests that due to the noncommutativity, Lorentz invariance is violated in the relativistic quantum mechanics region.
  • We report the first result on Ge-76 neutrinoless double beta decay from CDEX-1 experiment at China Jinping Underground Laboratory. A mass of 994 g p-type point-contact high purity germanium detector has been installed to search the neutrinoless double beta decay events, as well as to directly detect dark matter particles. An exposure of 304 kg*day has been analyzed. The wideband spectrum from 500 keV to 3 MeV was obtained and the average event rate at the 2.039 MeV energy range is about 0.012 count per keV per kg per day. The half-life of Ge-76 neutrinoless double beta decay has been derived based on this result as: T 1/2 > 6.4*10^22 yr (90% C.L.). An upper limit on the effective Majorana-neutrino mass of 5.0 eV has been achieved. The possible methods to further decrease the background level have been discussed and will be pursued in the next stage of CDEX experiment.
  • A previous formal derivation of the effective chiral Lagrangian for low-lying pseudoscalar mesons from first-principles QCD without approximations [Wang et al., Phys. Rev. D61, (2000) 54011] is generalized to further include scalar, vector, and axial-vector mesons. In the large Nc limit and with an Abelian approximation, we show that the properties of the newly added mesons in our formalism are determined by the corresponding underlying fundamental homogeneous Bethe--Salpeter equation in the ladder approximation, which yields the equations of motion for the scalar, vector, and axial-vector meson fields at the level of an effective chiral Lagrangian. The masses appearing in the equations of motion of the meson fields are those determined by the corresponding Bethe--Salpeter equation.
  • Collective classification of vertices is a task of assigning categories to each vertex in a graph based on both vertex attributes and link structure. Nevertheless, some existing approaches do not use the features of neighbouring vertices properly, due to the noise introduced by these features. In this paper, we propose a graph-based recursive neural network framework for collective vertex classification. In this framework, we generate hidden representations from both attributes of vertices and representations of neighbouring vertices via recursive neural networks. Under this framework, we explore two types of recursive neural units, naive recursive neural unit and long short-term memory unit. We have conducted experiments on four real-world network datasets. The experimental results show that our frame- work with long short-term memory model achieves better results and outperforms several competitive baseline methods.
  • In this work, we propose a new communication system for illuminated areas, indoors and outdoors. Light sources in our environments -such as light bulbs or even the sun- are our signal emitters, but we do not modulate data at the light source. We instead propose that the environment itself modulates the ambient light signals: if mobile elements 'wear' patterns consisting of distinctive reflecting surfaces, single photodiode could decode the disturbed light signals to read passive information. Achieving this vision requires a deep understanding of a new type of communication channel. Many parameters can affect the performance of passive communication based on visible light: the size of reflective surfaces, the surrounding light intensity, the speed of mobile objects, the field-of-view of the receiver, to name a few. In this paper, we present our vision for a passive communication channel with visible light, the design challenges and the evaluation of an outdoor application where our receiver decodes information from a car moving at 18 km/h.
  • In this paper, we study a new kind of vertex operator algebra related to the twisted Heisenberg-Virasoro algebra, which we call the twisted Heisenberg-Virasoro vertex operator algebra, and its modules. Specifically, we present some results concerning the relationship between the restricted module categories of twisted Heisenberg-Virasoro algebras of rank one and rank two and several different kinds of module categories of their corresponding vertex algebras. We also study fully the structures of the twisted Heisenberg-Virasoro vertex operator algebra, give a characterization of it as a tensor product of two well-known vertex operator algebras, and solve the commutant problem.
  • With the help of information and communication technologies, studies on the overall social networks have been extensively reported recently. However, investigations on the directed Ego Communication Networks (ECNs) remain insufficient, where an ECN stands for a sub network composed of a centralized individual and his/her direct contacts. In this paper, the directed ECNs are built on the Call Detail Records (CDRs), which cover more than 7 million people of a provincial capital city in China for half a year. Results show that there is a critical size for ECN at about 150, above which the average emotional closeness between ego and alters drops, the balanced relationship between ego and network collapses, and the proportion of strong ties decreases. This paper not only demonstrate the significance of ECN size in affecting its properties, but also shows accordance with the "Dunbar's Number". These results can be viewed as a cross-culture supportive evidence to the well-known Social Brain Hypothesis (SBH).
  • [Background]: Systematic Literature Review (SLR) has become an important software engineering research method but costs tremendous efforts. [Aim]: This paper proposes an approach to leverage on empirically evolved ontology to support automating key SLR activities. [Method]: First, we propose an ontology, SLRONT, built on SLR experiences and best practices as a groundwork to capture common terminologies and their relationships during SLR processes; second, we present an extended version of SLRONT, the COSONT and instantiate it with the knowledge and concepts extracted from structured abstracts. Case studies illustrate the details of applying it for supporting SLR steps. [Results]: Results show that through using COSONT, we acquire the same conclusion compared with sheer manual works, but the efforts involved is significantly reduced. [Conclusions]: The approach of using ontology could effectively and efficiently support the conducting of systematic literature review.
  • Occlusion is one of the most challenging problems in depth estimation. Previous work has modeled the single-occluder occlusion in light field and get good results, however it is still difficult to obtain accurate depth for multi-occluder occlusion. In this paper, we explore the multi-occluder occlusion model in light field, and derive the occluder-consistency between the spatial and angular space which is used as a guidance to select the un-occluded views for each candidate occlusion point. Then an anti-occlusion energy function is built to regularize depth map. The experimental results on public light field datasets have demonstrated the advantages of the proposed algorithm compared with other state-of-the-art light field depth estimation algorithms, especially in multi-occluder areas.