-
This work presents stochastic optimization methods targeted at least-squares
problems involving Monte Carlo integration. While the most common approach to
solving these problems is to apply stochastic gradient descent (SGD) or similar
methods such as AdaGrad and Adam, which involve estimating a stochastic
gradient from a small number of Monte Carlo samples computed at each iteration,
we show that for this category of problems it is possible to achieve faster
asymptotic convergence rates using an increasing number of samples per
iteration instead, a strategy we call increasing precision (IP). We then
improve pre-asymptotic convergence by introducing a hybrid approach that
combines the qualities of increasing precision and otherwise "constant"
precision, resulting in methods such as the IP-SGD hybrid and IP-AdaGrad
hybrid, essentially by modifying their gradient estimators to have an
equivalent effect to increasing precision. Finally, we observe that, in some
problems, incorporating a Gauss-Newton preconditioner to the IP-SGD hybrid
method can provide much better convergence than employing a Quasi-Newton
approach or covariance-preconditioning as in AdaGrad or Adam.
-
We present a new task that predicts future locations of people observed in
first-person videos. Consider a first-person video stream continuously recorded
by a wearable camera. Given a short clip of a person that is extracted from the
complete stream, we aim to predict that person's location in future frames. To
facilitate this future person localization ability, we make the following three
key observations: a) First-person videos typically involve significant
ego-motion which greatly affects the location of the target person in future
frames; b) Scales of the target person act as a salient cue to estimate a
perspective effect in first-person videos; c) First-person videos often capture
people up-close, making it easier to leverage target poses (e.g., where they
look) for predicting their future locations. We incorporate these three
observations into a prediction framework with a multi-stream
convolution-deconvolution architecture. Experimental results reveal our method
to be effective on our new dataset as well as on a public social interaction
dataset.
-
We present a new computational model for gaze prediction in egocentric videos
by exploring patterns in temporal shift of gaze fixations (attention
transition) that are dependent on egocentric manipulation tasks. Our assumption
is that the high-level context of how a task is completed in a certain way has
a strong influence on attention transition and should be modeled for gaze
prediction in natural dynamic scenes. Specifically, we propose a hybrid model
based on deep neural networks which integrates task-dependent attention
transition with bottom-up saliency prediction. In particular, the
task-dependent attention transition is learned with a recurrent neural network
to exploit the temporal context of gaze fixations, e.g. looking at a cup after
moving gaze away from a grasped bottle. Experiments on public egocentric
activity datasets show that our model significantly outperforms
state-of-the-art gaze prediction methods and is able to learn meaningful
transition of human attention.
-
We present an accurate stereo matching method using local expansion moves
based on graph cuts. This new move-making scheme is used to efficiently infer
per-pixel 3D plane labels on a pairwise Markov random field (MRF) that
effectively combines recently proposed slanted patch matching and curvature
regularization terms. The local expansion moves are presented as many
alpha-expansions defined for small grid regions. The local expansion moves
extend traditional expansion moves by two ways: localization and spatial
propagation. By localization, we use different candidate alpha-labels according
to the locations of local alpha-expansions. By spatial propagation, we design
our local alpha-expansions to propagate currently assigned labels for nearby
regions. With this localization and spatial propagation, our method can
efficiently infer MRF models with a continuous label space using randomized
search. Our method has several advantages over previous approaches that are
based on fusion moves or belief propagation; it produces submodular moves
deriving a subproblem optimality; it helps find good, smooth, piecewise linear
disparity maps; it is suitable for parallelization; it can use cost-volume
filtering techniques for accelerating the matching cost computations. Even
using a simple pairwise MRF, our method is shown to have best performance in
the Middlebury stereo benchmark V2 and V3.
-
We propose a privacy-preserving framework for learning visual classifiers by
leveraging distributed private image data. This framework is designed to
aggregate multiple classifiers updated locally using private data and to ensure
that no private information about the data is exposed during and after its
learning procedure. We utilize a homomorphic cryptosystem that can aggregate
the local classifiers while they are encrypted and thus kept secret. To
overcome the high computational cost of homomorphic encryption of
high-dimensional classifiers, we (1) impose sparsity constraints on local
classifier updates and (2) propose a novel efficient encryption scheme named
doubly-permuted homomorphic encryption (DPHE) which is tailored to sparse
high-dimensional data. DPHE (i) decomposes sparse data into its constituent
non-zero values and their corresponding support indices, (ii) applies
homomorphic encryption only to the non-zero values, and (iii) employs double
permutations on the support indices to make them secret. Our experimental
evaluation on several public datasets shows that the proposed approach achieves
comparable performance against state-of-the-art visual recognition methods
while preserving privacy and significantly outperforms other privacy-preserving
methods.
-
We propose a new multi-frame method for efficiently computing scene flow
(dense depth and optical flow) and camera ego-motion for a dynamic scene
observed from a moving stereo camera rig. Our technique also segments out
moving objects from the rigid scene. In our method, we first estimate the
disparity map and the 6-DOF camera motion using stereo matching and visual
odometry. We then identify regions inconsistent with the estimated camera
motion and compute per-pixel optical flow only at these regions. This flow
proposal is fused with the camera motion-based flow proposal using fusion moves
to obtain the final optical flow and motion segmentation. This unified
framework benefits all four tasks - stereo, optical flow, visual odometry and
motion segmentation leading to overall higher accuracy and efficiency. Our
method is currently ranked third on the KITTI 2015 scene flow benchmark.
Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3
orders of magnitude faster than the top six methods. We also report a thorough
evaluation on challenging Sintel sequences with fast camera and object motion,
where our method consistently outperforms OSF [Menze and Geiger, 2015], which
is currently ranked second on the KITTI benchmark.
-
Describing the color and textural information of a person image is one of the
most crucial aspects of person re-identification (re-id). In this paper, we
present novel meta-descriptors based on a hierarchical distribution of pixel
features. Although hierarchical covariance descriptors have been successfully
applied to image classification, the mean information of pixel features, which
is absent from the covariance, tends to be the major discriminative information
for person re-id. To solve this problem, we describe a local region in an image
via hierarchical Gaussian distribution in which both means and covariances are
included in their parameters. More specifically, the region is modeled as a set
of multiple Gaussian distributions in which each Gaussian represents the
appearance of a local patch. The characteristics of the set of Gaussians are
again described by another Gaussian distribution. In both steps, we embed the
parameters of the Gaussian into a point of Symmetric Positive Definite (SPD)
matrix manifold. By changing the way to handle mean information in this
embedding, we develop two hierarchical Gaussian descriptors. Additionally, we
develop feature norm normalization methods with the ability to alleviate the
biased trends that exist on the descriptors. The experimental results conducted
on five public datasets indicate that the proposed descriptors achieve
remarkably high performance on person re-id.
-
We envision a future time when wearable cameras are worn by the masses,
recording first-person point-of-view (POV) videos of everyday life. While these
cameras can enable new assistive technologies and novel research challenges,
they also raise serious privacy concerns. For example, first-person videos
passively recorded by wearable cameras will necessarily include anyone who
comes into the view of a camera -- with or without consent. Motivated by these
benefits and risks, we developed a self-search technique tailored to
first-person videos. The key observation of our work is that the egocentric
head motion of a target person (i.e., the self) is observed both in the POV
video of the target and observer. The motion correlation between the target
person's video and the observer's video can then be used to identify instances
of the self uniquely. We incorporate this feature into the proposed approach
that computes the motion correlation over densely-sampled trajectories to
search for a target in observer videos. Our approach significantly improves
self-search performance over several well-known face detectors and recognizers.
Furthermore, we show how our approach can enable several practical applications
such as privacy filtering, target video retrieval, and social group clustering.
-
The joint JAXA/NASA ASTRO-H mission is the sixth in a series of highly
successful X-ray missions developed by the Institute of Space and Astronautical
Science (ISAS), with a planned launch in 2015. The ASTRO-H mission is equipped
with a suite of sensitive instruments with the highest energy resolution ever
achieved at E > 3 keV and a wide energy range spanning four decades in energy
from soft X-rays to gamma-rays. The simultaneous broad band pass, coupled with
the high spectral resolution of Delta E < 7 eV of the micro-calorimeter, will
enable a wide variety of important science themes to be pursued. ASTRO-H is
expected to provide breakthrough results in scientific areas as diverse as the
large-scale structure of the Universe and its evolution, the behavior of matter
in the gravitational strong field regime, the physical conditions in sites of
cosmic-ray acceleration, and the distribution of dark matter in galaxy clusters
at different redshifts.
-
The joint JAXA/NASA ASTRO-H mission is the sixth in a series of highly
successful X-ray missions initiated by the Institute of Space and Astronautical
Science (ISAS). ASTRO-H will investigate the physics of the high-energy
universe via a suite of four instruments, covering a very wide energy range,
from 0.3 keV to 600 keV. These instruments include a high-resolution,
high-throughput spectrometer sensitive over 0.3-2 keV with high spectral
resolution of Delta E < 7 eV, enabled by a micro-calorimeter array located in
the focal plane of thin-foil X-ray optics; hard X-ray imaging spectrometers
covering 5-80 keV, located in the focal plane of multilayer-coated, focusing
hard X-ray mirrors; a wide-field imaging spectrometer sensitive over 0.4-12
keV, with an X-ray CCD camera in the focal plane of a soft X-ray telescope; and
a non-focusing Compton-camera type soft gamma-ray detector, sensitive in the
40-600 keV band. The simultaneous broad bandpass, coupled with high spectral
resolution, will enable the pursuit of a wide variety of important science
themes.
-
The joint JAXA/NASA ASTRO-H mission is the sixth in a series of highly
successful X-ray missions initiated by the Institute of Space and Astronautical
Science (ISAS). ASTRO-H will investigate the physics of the high-energy
universe by performing high-resolution, high-throughput spectroscopy with
moderate angular resolution. ASTRO-H covers very wide energy range from 0.3 keV
to 600 keV. ASTRO-H allows a combination of wide band X-ray spectroscopy (5-80
keV) provided by multilayer coating, focusing hard X-ray mirrors and hard X-ray
imaging detectors, and high energy-resolution soft X-ray spectroscopy (0.3-12
keV) provided by thin-foil X-ray optics and a micro-calorimeter array. The
mission will also carry an X-ray CCD camera as a focal plane detector for a
soft X-ray telescope (0.4-12 keV) and a non-focusing soft gamma-ray detector
(40-600 keV) . The micro-calorimeter system is developed by an international
collaboration led by ISAS/JAXA and NASA. The simultaneous broad bandpass,
coupled with high spectral resolution of Delta E ~7 eV provided by the
micro-calorimeter will enable a wide variety of important science themes to be
pursued.