• Hand keypoints detection and pose estimation has numerous applications in computer vision, but it is still an unsolved problem in many aspects. An application of hand keypoints detection is in performing cognitive assessments of a subject by observing the performance of that subject in physical tasks involving rapid finger motion. As a part of this work, we introduce a novel hand key-points benchmark dataset that consists of hand gestures recorded specifically for cognitive behavior monitoring. We explore the state of the art methods in hand keypoint detection and we provide quantitative evaluations for the performance of these methods on our dataset. In future, these results and our dataset can serve as a useful benchmark for hand keypoint recognition for rapid finger movements.
  • SSD is one of the state-of-the-art object detection algorithms, and it combines high detection accuracy with real-time speed. However, it is widely recognized that SSD is less accurate in detecting small objects compared to large objects, because it ignores the context from outside the proposal boxes. In this paper, we present CSSD--a shorthand for context-aware single-shot multibox object detector. CSSD is built on top of SSD, with additional layers modeling multi-scale contexts. We describe two variants of CSSD, which differ in their context layers, using dilated convolution layers (DiCSSD) and deconvolution layers (DeCSSD) respectively. The experimental results show that the multi-scale context modeling significantly improves the detection accuracy. In addition, we study the relationship between effective receptive fields (ERFs) and the theoretical receptive fields (TRFs), particularly on a VGGNet. The empirical results further strengthen our conclusion that SSD coupled with context layers achieves better detection results especially for small objects ($+3.2\% {\rm AP}_{@0.5}$ on MS-COCO compared to the newest SSD), while maintaining comparable runtime performance.
  • HTKS is a game-like cognitive assessment method, designed for children between four and eight years of age. During the HTKS assessment, a child responds to a sequence of requests, such as "touch your head" or "touch your toes". The cognitive challenge stems from the fact that the children are instructed to interpret these requests not literally, but by touching a different body part than the one stated. In prior work, we have developed the CogniLearn system, that captures data from subjects performing the HTKS game, and analyzes the motion of the subjects. In this paper we propose some specific improvements that make the motion analysis module more accurate. As a result of these improvements, the accuracy in recognizing cases where subjects touch their toes has gone from 76.46% in our previous work to 97.19% in this paper.
  • The database community has long recognized the importance of graphical query interface to the usability of data management systems. Yet, relatively less has been done. We present Orion, a visual interface for querying ultra-heterogeneous graphs. It iteratively assists users in query graph construction by making suggestions via machine learning methods. In its active mode, Orion automatically suggests top-k edges to be added to a query graph. In its passive mode, the user adds a new edge manually, and Orion suggests a ranked list of labels for the edge. Orion's edge ranking algorithm, Random Decision Paths (RDP), makes use of a query log to rank candidate edges by how likely they will match the user's query intent. Extensive user studies using Freebase demonstrated that Orion users have a 70% success rate in constructing complex query graphs, a significant improvement over the 58% success rate by the users of a baseline system that resembles existing visual query builders. Furthermore, using active mode only, the RDP algorithm was compared with several methods adapting other machine learning algorithms such as random forests and naive Bayes classifier, as well as class association rules and recommendation systems based on singular value decomposition. On average, RDP required 40 suggestions to correctly reach a target query graph (using only its active mode of suggestion) while other methods required 1.5--4 times as many suggestions.
  • Supervised learning of convolutional neural networks (CNNs) can require very large amounts of labeled data. Labeling thousands or millions of training examples can be extremely time consuming and costly. One direction towards addressing this problem is to create features from unlabeled data. In this paper we propose a new method for training a CNN, with no need for labeled instances. This method for unsupervised feature learning is then successfully applied to a challenging object recognition task. The proposed algorithm is relatively simple, but attains accuracy comparable to that of more sophisticated methods. The proposed method is significantly easier to train, compared to existing CNN methods, making fewer requirements on manually labeled training data. It is also shown to be resistant to overfitting. We provide results on some well-known datasets, namely STL-10, CIFAR-10, and CIFAR-100. The results show that our method provides competitive performance compared with existing alternative methods. Selective Convolutional Neural Network (S-CNN) is a simple and fast algorithm, it introduces a new way to do unsupervised feature learning, and it provides discriminative features which generalize well.
  • Human body pose estimation and hand detection are two important tasks for systems that perform computer vision-based sign language recognition(SLR). However, both tasks are challenging, especially when the input is color videos, with no depth information. Many algorithms have been proposed in the literature for these tasks, and some of the most successful recent algorithms are based on deep learning. In this paper, we introduce a dataset for human pose estimation for SLR domain. We evaluate the performance of two deep learning based pose estimation methods, by performing user-independent experiments on our dataset. We also perform transfer learning, and we obtain results that demonstrate that transfer learning can improve pose estimation accuracy. The dataset and results from these methods can create a useful baseline for future works.
  • This paper introduces principal motion components (PMC), a new method for one-shot gesture recognition. In the considered scenario a single training-video is available for each gesture to be recognized, which limits the application of traditional techniques (e.g., HMMs). In PMC, a 2D map of motion energy is obtained per each pair of consecutive frames in a video. Motion maps associated to a video are processed to obtain a PCA model, which is used for recognition under a reconstruction-error approach. The main benefits of the proposed approach are its simplicity, easiness of implementation, competitive performance and efficiency. We report experimental results in one-shot gesture recognition using the ChaLearn Gesture Dataset; a benchmark comprising more than 50,000 gestures, recorded as both RGB and depth video with a Kinect camera. Results obtained with PMC are competitive with alternative methods proposed for the same data set.
  • This paper proposes a general framework for matching similar subsequences in both time series and string databases. The matching results are pairs of query subsequences and database subsequences. The framework finds all possible pairs of similar subsequences if the distance measure satisfies the "consistency" property, which is a property introduced in this paper. We show that most popular distance functions, such as the Euclidean distance, DTW, ERP, the Frechet distance for time series, and the Hamming distance and Levenshtein distance for strings, are all "consistent". We also propose a generic index structure for metric spaces named "reference net". The reference net occupies O(n) space, where n is the size of the dataset and is optimized to work well with our framework. The experiments demonstrate the ability of our method to improve retrieval performance when combined with diverse distance measures. The experiments also illustrate that the reference net scales well in terms of space overhead and query time.