We present an automatic moment capture system that runs in real-time on
mobile cameras. The system is designed to run in the viewfinder mode and
capture a burst sequence of frames before and after the shutter is pressed. For
each frame, the system predicts in real-time a "goodness" score, based on which
the best moment in the burst can be selected immediately after the shutter is
released, without any user interference. To solve the problem, we develop a
highly efficient deep neural network ranking model, which implicitly learns a
"latent relative attribute" space to capture subtle visual differences within a
sequence of burst images. Then the overall goodness is computed as a linear
aggregation of the goodnesses of all the latent attributes. The latent relative
attributes and the aggregation function can be seamlessly integrated in one
fully convolutional network and trained in an end-to-end fashion. To obtain a
compact model which can run on mobile devices in real-time, we have explored
and evaluated a wide range of network design choices, taking into account the
constraints of model size, computational cost, and accuracy. Extensive studies
show that the best frame predicted by our model hit users' top-1 (out of 11 on
average) choice for $64.1\%$ cases and top-3 choices for $86.2\%$ cases.
Moreover, the model(only 0.47M Bytes) can run in real time on mobile devices,
e.g. only 13ms on iPhone 7 for one frame prediction.
Optical flow estimation is a widely known problem in computer vision
introduced by Gibson, J.J(1950) to describe the visual perception of human by
stimulus objects. Estimation of optical flow model can be achieved by solving
for the motion vectors from region of interest in the the different timeline.
In this paper, we assumed slightly uniform change of velocity between two
nearby frames, and solve the optical flow problem by traditional method,
Lucas-Kanade(1981). This method performs minimization of errors between
template and target frame warped back onto the template. Solving minimization
steps requires optimization methods which have diverse convergence rate and
error. We explored first and second order optimization methods, and compare
their results with Gauss-Newton method in Lucas-Kanade. We generated 105 videos
with 10,500 frames by synthetic objects, and 10 videos with 1,000 frames from
real world footage. Our experimental results could be used as tuning parameters
for Lucas-Kanade method.