• ### Human experts vs. machines in taxa recognition(1708.06899)

May 17, 2019 q-bio.QM, cs.LG, stat.ML
The step of expert taxa recognition currently slows down the response time of many bioassessments. Shifting to quicker and cheaper state-of-the-art machine learning approaches is still met with expert scepticism towards the ability and logic of machines. In our study, we investigate both the differences in accuracy and in the identification logic of taxonomic experts and machines. We propose a systematic approach utilizing deep Convolutional Neural Nets with the transfer learning paradigm and extensively evaluate it over a multi-pose taxonomic dataset with hierarchical labels specifically created for this comparison. We also study the prediction accuracy on different ranks of taxonomic hierarchy in detail. Our results revealed that human experts using actual specimens yield the lowest classification error ($\overline{CE}=6.1\%$). However, a much faster, automated approach using deep Convolutional Neural Nets comes close to human accuracy ($\overline{CE}=11.4\%$). Contrary to previous findings in the literature, we find that for machines following a typical flat classification approach commonly used in machine learning performs better than forcing machines to adopt a hierarchical, local per parent node approach used by human taxonomic experts. Finally, we publicly share our unique dataset to serve as a public benchmark dataset in this field.
• ### Outlier Edge Detection Using Random Graph Generation Models and Applications(1606.06447)

June 21, 2016 physics.soc-ph, cs.SI
Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.
• ### Limited Random Walk Algorithm for Big Graph Data Clustering(1606.06450)

June 21, 2016 physics.soc-ph, cs.SI
Graph clustering is an important technique to understand the relationships between the vertices in a big graph. In this paper, we propose a novel random-walk-based graph clustering method. The proposed method restricts the reach of the walking agent using an inflation function and a normalization function. We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems. Previous random-walk-based algorithms depend on the chosen fitness function to find the clusters around a seed vertex. The proposed algorithm tackles the problem in an entirely different manner. We use the limited random walk procedure to find attracting vertices in a graph and use them as features to cluster the vertices. According to the experimental results on the simulated graph data and the real-world big graph data, the proposed method is superior to the state-of-the-art methods in solving graph clustering problems. Since the proposed method uses the embarrassingly parallel paradigm, it can be efficiently implemented and embedded in any parallel computing environment such as a MapReduce framework. Given enough computing resources, we are capable of clustering graphs with millions of vertices and hundreds millions of edges in a reasonable time.
• ### ShakeMe: Key Generation From Shared Motion(1507.06353)

Sept. 13, 2015 cs.CR
Devices equipped with accelerometer sensors such as today's mobile devices can make use of motion to exchange information. A typical example for shared motion is shaking of two devices which are held together in one hand. Deriving a shared secret (key) from shared motion, e.g. for device pairing, is an obvious application for this. Only the keys need to be exchanged between the peers and neither the motion data nor the features extracted from it. This makes the pairing fast and easy. For this, each device generates an information signal (key) independently of each other and, in order to pair, they should be identical. The key is essentially derived by quantizing certain well discriminative features extracted from the accelerometer data after an implicit synchronization. In this paper, we aim at finding a small set of effective features which enable a significantly simpler quantization procedure than the prior art. Our tentative results with authentic accelerometer data show that this is possible with a competent accuracy ($76$%) and key strength (entropy approximately $15$ bits).