• In contrast with traditional video, omnidirectional video enables spherical viewing direction with support for head-mounted displays, providing an interactive and immersive experience. Unfortunately, to the best of our knowledge, there are few visual quality assessment (VQA) methods, either subjective or objective, for omnidirectional video coding. This paper proposes both subjective and objective methods for assessing quality loss in encoding omnidirectional video. Specifically, we first present a new database, which includes the viewing direction data from several subjects watching omnidirectional video sequences. Then, from our database, we find a high consistency in viewing directions across different subjects. The viewing directions are normally distributed in the center of the front regions, but they sometimes fall into other regions, related to video content. Given this finding, we present a subjective VQA method for measuring difference mean opinion score (DMOS) of the whole and regional omnidirectional video, in terms of overall DMOS (O-DMOS) and vectorized DMOS (V-DMOS), respectively. Moreover, we propose two objective VQA methods for encoded omnidirectional video, in light of human perception characteristics of omnidirectional video. One method weighs the distortion of pixels with regard to their distances to the center of front regions, which considers human preference in a panorama. The other method predicts viewing directions according to video content, and then the predicted viewing directions are leveraged to allocate weights to the distortion of each pixel in our objective VQA method. Finally, our experimental results verify that both the subjective and objective methods proposed in this paper advance state-of-the-art VQA for omnidirectional video.
  • Over the past few years, deep neural networks (DNNs) have exhibited great success in predicting the saliency of images. However, there are few works that apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of our LEDOV database, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. Therefore, we develop a two-layer convolutional long short-term memory (2C-LSTM) network in our DNN-based method, using the extracted features of OM-CNN as the input. Consequently, the inter-frame saliency maps of videos can be generated, which consider the transition of attention across video frames. Finally, the experimental results show that our method advances the state-of-the-art in video saliency prediction.
  • The latest High Efficiency Video Coding (HEVC) standard has been increasingly applied to generate video streams over the Internet. However, HEVC compressed videos may incur severe quality degradation, particularly at low bit-rates. Thus, it is necessary to enhance the visual quality of HEVC videos at the decoder side. To this end, this paper proposes a Quality Enhancement Convolutional Neural Network (QE-CNN) method that does not require any modification of the encoder to achieve quality enhancement for HEVC. In particular, our QE-CNN method learns QE-CNN-I and QE-CNN-P models to reduce the distortion of HEVC I and P frames, respectively. The proposed method differs from the existing CNN-based quality enhancement approaches, which only handle intra-coding distortion and are thus not suitable for P frames. Our experimental results validate that our QE-CNN method is effective in enhancing quality for both I and P frames of HEVC videos. To apply our QE-CNN method in time-constrained scenarios, we further propose a Time-constrained Quality Enhancement Optimization (TQEO) scheme. Our TQEO scheme controls the computational time of QE-CNN to meet a target, meanwhile maximizing the quality enhancement. Next, the experimental results demonstrate the effectiveness of our TQEO scheme from the aspects of time control accuracy and quality enhancement under different time constraints. Finally, we design a prototype to implement our TQEO scheme in a real-time scenario.
  • High Efficiency Video Coding (HEVC) significantly reduces bit-rates over the proceeding H.264 standard but at the expense of extremely high encoding complexity. In HEVC, the quad-tree partition of coding unit (CU) consumes a large proportion of the HEVC encoding complexity, due to the bruteforce search for rate-distortion optimization (RDO). Therefore, this paper proposes a deep learning approach to predict the CU partition for reducing the HEVC complexity at both intra- and inter-modes, which is based on convolutional neural network (CNN) and long- and short-term memory (LSTM) network. First, we establish a large-scale database including substantial CU partition data for HEVC intra- and inter-modes. This enables deep learning on the CU partition. Second, we represent the CU partition of an entire coding tree unit (CTU) in the form of a hierarchical CU partition map (HCPM). Then, we propose an early-terminated hierarchical CNN (ETH-CNN) for learning to predict the HCPM. Consequently, the encoding complexity of intra-mode HEVC can be drastically reduced by replacing the brute-force search with ETH-CNN to decide the CU partition. Third, an early-terminated hierarchical LSTM (ETH-LSTM) is proposed to learn the temporal correlation of the CU partition. Then, we combine ETH-LSTM and ETH-CNN to predict the CU partition for reducing the HEVC complexity for inter-mode. Finally, experimental results show that our approach outperforms other state-of-the-art approaches in reducing the HEVC complexity at both intra- and inter-modes.
  • The past few years have witnessed great success in applying deep learning to enhance the quality of compressed image/video. The existing approaches mainly focus on enhancing the quality of a single frame, ignoring the similarity between consecutive frames. In this paper, we investigate that heavy quality fluctuation exists across compressed video frames, and thus low quality frames can be enhanced using the neighboring high quality frames, seen as Multi-Frame Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach for compressed video, as a first attempt in this direction. In our approach, we firstly develop a Support Vector Machine (SVM) based detector to locate Peak Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame Convolutional Neural Network (MF-CNN) is designed to enhance the quality of compressed video, in which the non-PQF and its nearest two PQFs are as the input. The MF-CNN compensates motion between the non-PQF and PQFs through the Motion Compensation subnet (MC-subnet). Subsequently, the Quality Enhancement subnet (QE-subnet) reduces compression artifacts of the non-PQF with the help of its nearest PQFs. Finally, the experiments validate the effectiveness and generality of our MFQE approach in advancing the state-of-the-art quality enhancement of compressed video. The code of our MFQE approach is available at https://github.com/ryangBUAA/MFQE.git
  • The latest High Efficiency Video Coding (HEVC) standard significantly improves coding efficiency over its previous video coding standards. The expense of such improvement is enormous computational complexity, from both encoding and decoding sides. Since computational capability and power capacity are diverse across portable devices, it is necessary to reduce decoding complexity to a target with tolerable quality loss, so called complexity control. This paper proposes a Saliency-Guided Complexity Control (SGCC) approach for HEVC decoding, which reduces the decoding complexity to the target with minimal perceptual quality loss. First, we establish the SGCC formulation to minimize perceptual quality loss at the constraint on reduced decoding complexity, which is achieved via disabling Deblocking Filter (DF) and simplifying Motion Compensation (MC) of some non-salient Coding Tree Units (CTUs). One important component in this formulation is the modelled relationship between decoding complexity reduction and DF disabling/MC simplification, which determines the control accuracy of our approach. Another component is the modelled relationship between quality loss and DF disabling/MC simplification, responsible for optimizing perceptual quality. By solving the SGCC formulation for a given target complexity, we can obtain the DF and MC settings of each CTU, and then decoding complexity can be reduced to the target. Finally, the experimental results validate the effectiveness of our SGCC approach, from the aspects of control performance, complexity-distortion performance, fluctuation of quality loss and subjective quality.
  • Panoramic video provides immersive and interactive experience by enabling humans to control the field of view (FoV) through head movement (HM). Thus, HM plays a key role in modeling human attention on panoramic video. This paper establishes a database collecting subjects' HM positions on panoramic video sequences. From this database, we find that the HM data are highly consistent across subjects. Furthermore, we find that deep reinforcement learning (DRL) can be applied to predict HM positions, via maximizing the reward of imitating human HM scanpaths through the agent's actions. Based on our findings, we propose a DRL based HM prediction (DHP) approach with offline and online versions, called offline-DHP and online-DHP. In offline-DHP, multiple DRL workflows are run to determine potential HM positions at each panoramic frame. Then, a heat map of the potential HM positions, named the HM map, is generated as the output of offline-DHP. In online-DHP, the next HM position of one subject is estimated given the currently observed HM position, which is achieved by developing a DRL algorithm upon the learned offline-DHP model. Finally, the experimental results validate that our approach is effective in offline and online prediction of HM positions for panoramic video, and that the learned offline-DHP model can improve the performance of online-DHP.
  • This paper investigates secrecy rate optimization for a multicasting network, in which a transmitter broadcasts the same information to multiple legitimate users in the presence of multiple eavesdroppers. In order to improve the achievable secrecy rates, private jammers are employed to generate interference to confuse the eavesdroppers. These private jammers charge the legitimate transmitter for their jamming services based on the amount of interference received at the eavesdroppers. Therefore, this secrecy rate maximization problem is formulated as a Stackelberg game, in which the private jammers and the transmitter are the leaders and the follower of the game, respectively. A fixed interference price scenario is considered first, in which a closed-form solution is derived for the optimal amount of interference generated by the jammers to maximize the revenue of the legitimate transmitter. Based on this solution, the Stackelberg equilibrium of the proposed game, at which both legitimate transmitter and the private jammers achieve their maximum revenues, is then derived. Simulation results are also provided to validate these theoretical derivations.
  • This paper proposes a novel approach, based on unequal error protection, to enhance rateless codes with progressive recovery for layered multimedia delivery. With a parallel encoding structure, the proposed Progressive Rateless codes (PRC) assign unequal redundancy to each layer in accordance with their importance. Each output symbol contains information from all layers, and thus the stream layers can be recovered progressively at the expected received ratios of output symbols. Furthermore, the dependency between layers is naturally considered. The performance of the PRC is evaluated and compared with some related UEP approaches. Results show that our PRC approach provides better recovery performance with lower overhead both theoretically and numerically.
  • This paper proposes a novel scheme, based on progressive fountain codes, for broadcasting JPEG 2000 multimedia. In such a broadcast scheme, progressive resolution levels of images/video have been unequally protected when transmitted using the proposed progressive fountain codes. With progressive fountain codes applied in the broadcast scheme, the resolutions of images (JPEG 2000) or videos (MJPEG 2000) received by different users can be automatically adaptive to their channel qualities, i.e. the users with good channel qualities are possible to receive the high resolution images/vedio while the users with bad channel qualities may receive low resolution images/vedio. Finally, the performance of the proposed scheme is evaluated with the MJPEG 2000 broadcast prototype.