
The step of expert taxa recognition currently slows down the response time of
many bioassessments. Shifting to quicker and cheaper stateoftheart machine
learning approaches is still met with expert scepticism towards the ability and
logic of machines. In our study, we investigate both the differences in
accuracy and in the identification logic of taxonomic experts and machines. We
propose a systematic approach utilizing deep Convolutional Neural Nets with the
transfer learning paradigm and extensively evaluate it over a multipose
taxonomic dataset with hierarchical labels specifically created for this
comparison. We also study the prediction accuracy on different ranks of
taxonomic hierarchy in detail. Our results revealed that human experts using
actual specimens yield the lowest classification error ($\overline{CE}=6.1\%$).
However, a much faster, automated approach using deep Convolutional Neural Nets
comes close to human accuracy ($\overline{CE}=11.4\%$). Contrary to previous
findings in the literature, we find that for machines following a typical flat
classification approach commonly used in machine learning performs better than
forcing machines to adopt a hierarchical, local per parent node approach used
by human taxonomic experts. Finally, we publicly share our unique dataset to
serve as a public benchmark dataset in this field.

Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edgeegonetwork, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some realworld graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.

Graph clustering is an important technique to understand the relationships
between the vertices in a big graph. In this paper, we propose a novel
randomwalkbased graph clustering method. The proposed method restricts the
reach of the walking agent using an inflation function and a normalization
function. We analyze the behavior of the limited random walk procedure and
propose a novel algorithm for both global and local graph clustering problems.
Previous randomwalkbased algorithms depend on the chosen fitness function to
find the clusters around a seed vertex. The proposed algorithm tackles the
problem in an entirely different manner. We use the limited random walk
procedure to find attracting vertices in a graph and use them as features to
cluster the vertices. According to the experimental results on the simulated
graph data and the realworld big graph data, the proposed method is superior
to the stateoftheart methods in solving graph clustering problems. Since the
proposed method uses the embarrassingly parallel paradigm, it can be
efficiently implemented and embedded in any parallel computing environment such
as a MapReduce framework. Given enough computing resources, we are capable of
clustering graphs with millions of vertices and hundreds millions of edges in a
reasonable time.

Devices equipped with accelerometer sensors such as today's mobile devices
can make use of motion to exchange information. A typical example for shared
motion is shaking of two devices which are held together in one hand. Deriving
a shared secret (key) from shared motion, e.g. for device pairing, is an
obvious application for this. Only the keys need to be exchanged between the
peers and neither the motion data nor the features extracted from it. This
makes the pairing fast and easy. For this, each device generates an information
signal (key) independently of each other and, in order to pair, they should be
identical. The key is essentially derived by quantizing certain well
discriminative features extracted from the accelerometer data after an implicit
synchronization. In this paper, we aim at finding a small set of effective
features which enable a significantly simpler quantization procedure than the
prior art. Our tentative results with authentic accelerometer data show that
this is possible with a competent accuracy ($76$%) and key strength (entropy
approximately $15$ bits).