• We introduce a neural architecture for navigation in novel environments. Our proposed architecture learns to map from first-person views and plans a sequence of actions towards goals in the environment. The Cognitive Mapper and Planner (CMP) is based on two key ideas: a) a unified joint architecture for mapping and planning, such that the mapping is driven by the needs of the task, and b) a spatial memory with the ability to plan given an incomplete set of observations about the world. CMP constructs a top-down belief map of the world and applies a differentiable neural net planner to produce the next action at each time step. The accumulated belief of the world enables the agent to track visited regions of the environment. We train and test CMP on navigation problems in simulation environments derived from scans of real world buildings. Our experiments demonstrate that CMP outperforms alternate learning-based architectures, as well as, classical mapping and path planning approaches in many cases. Furthermore, it naturally extends to semantically specified goals, such as 'going to a chair'. We also deploy CMP on physical robots in indoor environments, where it achieves reasonable performance, even though it is trained entirely in simulation.
  • The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose. We propose a convolutional neural network-based approach to predict this representation and benchmark it on a large dataset of indoor scenes. Our experiments evaluate a number of practical design questions, demonstrate that we can infer this representation, and quantitatively and qualitatively demonstrate its merits compared to alternate representations.
  • We exploit the techniques of Bonora-Tonin superfield formalism to derive the off-shell nilpotent and absolutely anticommuting (anti-)BRST as well as (anti-)co-BRST symmetry transformations for the (1+1)-dimensional (2D) bosonized vector Schwinger model. In the derivation of above symmetries, we invoke the (dual)-horizontality conditions as well as gauge and (anti-)co-BRST invariant restrictions on the superfields that are defined onto the (2, 2)-dimensional supermanifold. We provide geometrical interpretation of the above nilpotent symmetries (and their corresponding charges). We also express the nilpotency and absolute anticommutativity of the (anti-)BRST and (anti-)co-BRST charges within the framework of augmented superfield formalism.
  • We analyze the constraints for a system of anti self-dual Yang-Mills (ASDYM) equations by means of the modified Faddeev-Jackiw method in K and J gauges \`{a} la Yang. We also establish the Hamiltonian flow for ASDYM system through the hidden BRS invariance in both the gauges. Finally, we remark on the bi-Hamiltonian nature of ASDYM and the compatibility of the symplectic structures therein.
  • The origin of small mixing among the quarks and a large mixing among the neutrinos has been an open question in particle physics. In order to answer this question, we postulate general relations among the quarks and the leptonic mixing angles at a high scale, which could be the scale of Grand Unified Theories. The central idea of these relations is that the quark and the leptonic mixing angles can be unified at some high scale either due to some quark-lepton symmetry or some other underlying mechanism and as a consequence, the mixing angles of the leptonic sector are proportional to that of the quark sector. We investigate the phenomenology of the possible relations where the leptonic mixing angles are proportional to the quark mixing angles at the unification scale by taking into account the latest experimental constraints from the neutrino sector. These relations are able to explain the pattern of leptonic mixing at the low scale and thereby hint that these relations could be possible signatures of a quark-lepton symmetry or some other underlying quark-lepton mixing unification mechanism at some high scale linked to Grand Unified Theories.
  • In this paper we explore two ways of using context for object detection. The first model focusses on people and the objects they commonly interact with, such as fashion and sports accessories. The second model considers more general object detection and uses the spatial relationships between objects and between objects and scenes. Our models are able to capture precise spatial relationships between the context and the object of interest, and make effective use of the appearance of the contextual region. On the newly released COCO dataset, our models provide relative improvements of up to 5% over CNN-based state-of-the-art detectors, with the gains concentrated on hard cases such as small objects (10% relative improvement).
  • In this work we propose a technique that transfers supervision between images from different modalities. We use learned representations from a large labeled modality as a supervisory signal for training representations for a new unlabeled paired modality. Our method enables learning of rich representations for unlabeled modalities and can be used as a pre-training procedure for new modalities with limited labeled data. We show experimental results where we transfer supervision from labeled RGB images to unlabeled depth and optical flow images and demonstrate large improvements for both these cross modal supervision transfers. Code, data and pre-trained models are available at https://github.com/s-gupta/fast-rcnn/tree/distillation
  • Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.
  • We derive the complete set of off-shell nilpotent (s^2_{(a)b} = 0) and absolutely anticommuting (s_b s_{ab} + s_{ab} s_b = 0) Becchi-Rouet-Stora-Tyutin (BRST) (s_b) as well as anti-BRST symmetry transformations (s_{ab}) corresponding to the combined Yang-Mills and non-Yang-Mills symmetries of the (2 + 1)-dimensional Jackiw-Pi model within the framework of augmented superfield formalism. The absolute anticommutativity of the (anti-)BRST symmetries is ensured by the existence of two sets of Curci-Ferrari (CF) type of conditions which emerge naturally in this formalism. The presence of CF conditions enables us to derive the coupled but equivalent Lagrangian densities. We also capture the (anti-)BRST invariance of the coupled Lagrangian densities in the superfield formalism. The derivation of the (anti-)BRST transformations of the auxiliary field \rho is one of the key findings which can neither be generated by the nilpotent (anti-)BRST charges nor by the requirements of the nilpotency and/or absolute anticommutativity of the (anti-)BRST transformations. Finally, we provide a bird's-eye view on the role of auxiliary field for various massive models and point out few striking similarities and some glaring differences among them.
  • We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for the query image by finding the caption that best represents the "consensus" of the set of candidate captions gathered from the nearest neighbor images. When measured by automatic evaluation metrics on the MS COCO caption evaluation server, these approaches perform as well as many recent approaches that generate novel captions. However, human studies show that a method that generates novel captions is still preferred over the nearest neighbor approach.
  • In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction. Classical approaches to action recognition either study the task of action classification at the image or video clip level or at best produce a bounding box around the person doing the action. We believe such an output is inadequate and a complete understanding can only come when we are able to associate objects in the scene to the different semantic roles of the action. To enable progress towards this goal, we annotate a dataset of 16K people instances in 10K images with actions they are doing and associate objects in the scene with different semantic roles for each action. Finally, we provide a set of baseline algorithms for this task and analyze error modes providing directions for future work.
  • Starting with high scale mixing unification hypothesis, we investigate the renormalization group evolution of mixing parameters and masses for Dirac type neutrinos. Following this hypothesis, the PMNS mixing angles and phase are taken to be identical to the CKM ones at a unifying high scale. Then, they are evolved to a low scale using renormalization-group equations. The notable feature of this hypothesis is that renormalization group evolution with quasi-degenerate mass pattern can explain largeness of leptonic mixing angles even for Dirac neutrinos. The renormalization group evolution "naturally" results in a non-zero and small value of leptonic mixing angle $\theta_{13}$. One of the important predictions of this work is that the mixing angle $\theta_{23}$ is non-maximal and lies only in the second octant. We also derive constraints on the allowed parameter range for the SUSY breaking and unification scales for which this hypothesis works. The results are novel and can be tested by present and future experiments.
  • This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When human judges compare the system captions to ones written by other people on our held-out test set, the system captions have equal or better quality 34% of the time.
  • In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions will be provided. To ensure consistency in evaluation of automatic caption generation algorithms, an evaluation server is used. The evaluation server receives candidate captions and scores them using several popular metrics, including BLEU, METEOR, ROUGE and CIDEr. Instructions for using the evaluation server are provided.
  • The goal of this work is to replace objects in an RGB-D scene with corresponding 3D models from a library. We approach this problem by first detecting and segmenting object instances in the scene using the approach from Gupta et al. [13]. We use a convolutional neural network (CNN) to predict the pose of the object. This CNN is trained using pixel normals in images containing rendered synthetic objects. When tested on real data, it outperforms alternative algorithms trained on real data. We then use this coarse pose estimate along with the inferred pixel support to align a small number of prototypical models to the data, and place the model that fits the best into the scene. We observe a 48% relative improvement in performance at the task of 3D detection over the current state-of-the-art [33], while being an order of magnitude faster at the same time.
  • We derive the off-shell nilpotent and absolutely anticommuting Becchi-Rouet-Stora-Tyutin (BRST) as well as anti-BRST transformations s_{(a)b} corresponding to the Yang-Mills gauge transformations of 3D Jackiw-Pi model by exploiting the "augmented" superfield formalism. We also show that the Curci-Ferrari restriction, which is a hallmark of any non-Abelian 1-form gauge theories, emerges naturally within this formalism and plays an instrumental role in providing the proof of absolute anticommutativity of s_{(a)b}.
  • The flamelet approach offers a viable framework for combustion modeling of homogeneous charge compression ignition (HCCI) engines under stratified mixture conditions. Scalar dissipation rate acts as a key parameter in flamelet-based combustion models which connects the physical mixing space to the reactive space. The aim of this paper is to gain fundamental insights into turbulent mixing in low temperature combustion (LTC) engines and investigate the modeling of scalar dissipation rate. Three direct numerical simulation (DNS) test cases of two-dimensional turbulent auto-ignition of a hydrogen-air mixture with different correlations of temperature and mixture fraction are considered, which are representative of different ignition regimes. The existing models of mean and conditional scalar dissipation rates, and probability density functions (PDFs) of mixture fraction and total enthalpy are a priori validated against the DNS data.
  • Cyclic-to-cycle variability, CCV, of intake-jet flow in an optical engine was measured using particle image velocimetry (PIV), revealing the possibility of two different flow patterns. A phase-dependent proper orthogonal decomposition (POD) analysis showed that one or the other flow pattern would appear in the average flow, sampled from test to test or sub-sampled within a single test; each data set contained individual cycles showing one flow pattern or the other. Three-dimensional velocity data from a large-eddy simulation (LES) of the engine showed that the PIV plane was cutting through a region of high shear between the intake jet and another large flow structure. Rotating the measurement plane 10{\deg} revealed one or the other flow structure observed in the PIV measurements. Thus, it was hypothesized that cycle-to-cycle variations in the swirl ratio result in the two different flow patterns in the PIV plane. Having an unambiguous metric to reveal large-scale flow CCV, causes for this variability were examined within the possible sources present in the available testing. In particular, variations in intake-port and cylinder pressure, lateral valve oscillations, and engine RPM were examined as potential causes for the cycle-to-cycle flow ariations using the phase-dependent POD coefficients. No direct correlation was seen between the intake port pressure, or the pressure drop across the intake valve, and the in-cylinder flow pattern. A correlation was observed between dominant flow pattern and cycle-to-cycle variations in intake valve horizontal position. RPM values and in-cylinder flow patterns did not correlate directly. However, a shift in flow pattern was observed between early and late cycles in a 2900-cycle test after an approximately 5 rpm engine speed perturbation.
  • In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. We propose a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity. We demonstrate that this geocentric embedding works better than using raw depth images for learning feature representations with convolutional neural networks. Our final object detection system achieves an average precision of 37.3%, which is a 56% relative improvement over existing methods. We then focus on the task of instance segmentation where we label pixels belonging to object instances found by our detector. For this task, we propose a decision forest approach that classifies pixels in the detection window as foreground or background using a family of unary and binary tests that query shape and geocentric pose features. Finally, we use the output from our object detectors in an existing superpixel classification framework for semantic scene segmentation and achieve a 24% relative improvement over current state-of-the-art for the object categories that we study. We believe advances such as those represented in this paper will facilitate the use of perception in fields like robotics.
  • We investigate the renormalization group evolution of masses and mixing angles of Majorana neutrinos under the `High Scale Mixing Unification' hypothesis. Assuming the unification of quark-lepton mixing angles at a high scale, we show that all the experimentally observed neutrino oscillation parameters can be obtained, within 3-$\sigma$ range, through the running of corresponding renormalization group equations provided neutrinos have same CP parity and are quasi-degenerate. One of the novel results of our analysis is that $\theta_{23}$ turns out to be non-maximal and lies in the second octant. Furthermore, we derive new constraints on the allowed parameter space for the unification scale, SUSY breaking scale and $\tan \beta$, for which the `High Scale Mixing Unification' hypothesis works.
  • We derive nilpotent and absolutely anticommuting (anti-)co-BRST symmetry transformations for the bosonized version of (1+1)-dimensional (2D) vector Schwinger model. These symmetry transformations turn out to be the analogue of co-exterior derivative of differential geometry as the total gauge-fixing term remains invariant under it. The exterior derivative is realized in terms of the (anti-)BRST symmetry transformations of the theory whereas the bosonic symmetries find their analogue in the Laplacian operator. The algebra obeyed by these symmetry transformations turns out to be exactly same as the algebra obeyed by the de Rham cohomological operators of differential geometry.
  • High-throughput DNA sequencers are becoming indispensable in our understanding of diseases at molecular level, in marker-assisted selection in agriculture and in microbial genetics research. These sequencing instruments produce enormous amount of data (often terabytes of raw data in a month) that requires efficient analysis, management and interpretation. The commonly used sequencing instrument today produces billions of short reads (upto 150 bases) from each run. The first step in the data analysis step is alignment of these short reads to the reference genome of choice. There are different open source algorithms available for sequence alignment to the reference genome. These tools normally have a high computational overhead, both in terms of number of processors and memory. Here, we propose a hybrid-computing environment called MUSIC (Mapping USIng hybrid Computing) for one of the most popular open source sequence alignment algorithm, BWA, using accelerators that show significant improvement in speed over the serial code.
  • We report SInC (SNV, Indel and CNV) simulator and read generator, an open-source tool capable of simulating biological variants taking into account a platform-specific error model. SInC is capable of simulating and generating single- and paired-end reads with user-defined insert size with high efficiency compared to the other existing tools. SInC, due to its multi-threaded capability during read generation, has a low time footprint. SInC is currently optimised to work in limited infrastructure setup and can efficiently exploit the commonly used quad-core desktop architecture to simulate short sequence reads with deep coverage for large genomes. Sinc can be downloaded from https://sourceforge.net/projects/sincsimulator/.
  • We derive the off-shell nilpotent and absolutely anticommuting Becchi-Rouet-Stora-Tyutin (BRST) as well as anti-BRST symmetry transformations corresponding to the non-Yang-Mills symmetry transformations of (2 + 1)- dimensional Jackiw-Pi (JP) model within the framework of "augmented" superfield formalism. The Curci-Ferrari restriction, which is a hallmark of non-Abelian 1-form gauge theories, does not appear in this case. One of the novel features of our present investigation is the derivation of proper (anti-)BRST symmetry transformations corresponding to the auxiliary field \rho that can not be derived by any conventional means.
  • We derive the canonical (anti-)commutation relations amongst the creation and annihilation operators of the various basic fields, present in the four (3 + 1)-dimensional (4D) free Abelian 2-from gauge theory, with the help of continuous symmetry transformations within the framework of Becchi-Rouet-Stora-Tyutin (BRST) formalism. We show that all the six continuous symmetries of the theory lead to the exactly the same non-vanishing (anti-)commutator amongst the creation and annihilation operators of the normal mode expansion of the basic fields of the theory.