Highlights Extraction


[demo]

Emotion-based highlights extraction is useful for retrieval and automatic trailer generation of drama video because the rich emotion part of a drama video is often the center of attraction to the viewer. We formulate highlights extraction as a regression problem to extract highlight segments and to predict how strong the viewerˇ¦s emotion would be evoked by the video segments. Unlike conventional rule-based approaches that rely on heuristics, the proposed system determines the relation between drama highlights and audiovisual features by machine learning. We also examine the special characteristics of drama video and propose human face, music emotion, shot duration, and motion magnitude as feature sets for highlights extraction.

Selected Publications:
[1]
K.-S. Lin, A. Lee, Y.-H. Yang, C.-T. Lee, and H. H. Chen, ˇ§Automatic highlights extraction for drama video using music emotion and human face features,ˇ¨ Proc. IEEE Int. Workshop on Multimedia Signal Processing 2011, Hang Zhou, China, (Top 10% Paper Award), Oct. 2011
[2]
K.-S. Lin, A. Lee, Y.-H. Yang, and H. H. Chen, ˇ§Rule-based automatic highlights extraction for drama video using music emotion and human face features,ˇ¨ Neurocomputing, (accepted), 2012

Piano Music Transcription


[project page]

Pitch, together with other midlevel music features such as rhythm and timbre, holds the promise of bridging the semantic gap between low-level features and high-level semantics for music understanding. We investigate the pitch estimation of a piano music signal by exemplar-based sparse representation. A note exemplar is a segment of a piano note, stored in the dictionary. We first describe how to represent a segment of the piano music signal as a linear combination of a small number of note exemplars from a large note exemplar dictionary and then show how the sparse representation problem can be solved by regularized minimization. Unlike previous approaches, the proposed approach does not require retraining for a new piano. Instead, only a dozen notes of the new piano are needed. This feature is computationally attractive and avoids intense manual labeling.

Selected Publications:
[1]
C.-T. Lee, Y.-H. Yang, and H. H. Chen, ˇ§Automatic transcription of piano music by sparse representation of magnitude spectra,ˇ¨ IEEE Int. Conf. Multimedia Expo., Barcelona, Spain, Jul. 2011
[2]
C.-T. Lee, Y.-H. Yang, and H. H. Chen, ˇ§Multipitch estimation of piano music by exemplar-based sparse representation,ˇ¨ IEEE Trans. Multimedia, vol. 14, no. 3, pp. 608-618, Jun. 2012

Automatic Accompaniment Generation


We present a system to automatically generate accompaniment that evokes specific emotions for a given melody. In particular, we propose harmony progression and onset rate as two key features for emotion-based accompaniment generation. The former refers to the progression of chords, and the latter refers to the number of music events (such as notes and drums) in a unit time. The harmony progression and the onset rate are altered according to the specified emotion expressed by the valence and arousal parameters, respectively.

Programmable Aperture Photography


[project page]
[youtube]

We present a system including a novel component called programmable aperture and two associated post-processing algorithms for high-quality light field acquisition. The shape of the programmable aperture can be adjusted and used to capture light field at full sensor resolution through multiple exposures without any additional optics and without moving the camera. High acquisition efficiency is achieved by employing an optimal multiplexing scheme, and quality data is obtained by using the two postprocessing algorithms designed for self calibration of photometric distortion and for multi-view depth estimation. View-dependent depth maps thus generated help boost the angular resolution of light field. Various post-exposure photographic effects are given to demonstrate the effectiveness of the system and the quality of the captured light field.

Selected Publications:
[1]
C.-K. Liang, T.-H. Lin, B.-Y. Weng, C. Liu, and H. H. Chen, ˇ§Programmable aperture photography: Multiplexed light field acquisition,ˇ¨ ACM Trans. Graph. (Proc. SIGGRAPH 2008), vol. 27, no. 3, 55:1-55:10, Aug. 2008
[2]
C.-K. Liang, G. Liu, H. H. Chen, ˇ§Light field acquisition using programmable aperture camera,ˇ¨ IEEE Int. Conf. Image Proc., 233-236, San Antonio, TX, Sept. 2007

Image Enhancement for Mobile Devices


[project page]

Reducing LCD backlight can save power consumption of a portable device, but it also decreases the contrast and brightness of the images. Previous approaches adjust the backlight level frame by frame to reach a specified image quality level but do not optimize it. On the contrary, the proposed method adjusts the backlight to meet the target power level while maintaining the image quality. This is achieved by incorporating brightness compensation and local contrast enhancement depending on the given backlight level. Experimental results show that the proposed algorithm outperforms previous methods.

Selected Publications:
[1]
T.-H. Huang, C.-K. Liang, S.-L. Yeh, and H. H. Chen, ˇ§JND-based enhancement of perceptibility for dim images,ˇ¨ IEEE Int. Conf. Image Process., San Diego, Oct. 2008
[2]
P.-S. Tsai, C.-K. Liang, T.-H. Huang, and H. H. Chen, ˇ§Image enhancement for backlight-scaled TFT-LCD displays,ˇ¨ IEEE Trans. Circuits Syst. Video Technol., (accepted), 2008
[3]
K.-T. Shih, T.-H. Huang, and H. H. Chen, ˇ§An anchoring method for color enhancement of images illuminated with dim backlight,ˇ¨ IEEE Int. Conf. Image Process., (submitted), 2012. Image Data

Rolling Shutter Distortion


[project page]

The electronic rolling shutter approach found in most low-end CMOS image sensors collects image data row by row, analogous to an open slit that scans over the image sequentially. Each row integrates light when the slit passes over it. Therefore, the scanlines of the image are not exposed at the same time. This sensor architecture creates a geometric distortion, known as the rolling shutter effect, for moving objects. We address this problem by using digital image processing techniques.

Selected Publications:
[1]
C.-K. Liang, L.-W. Chang, and H. H. Chen, ˇ§Analysis and compensation of rolling shutter effect,ˇ¨ IEEE Trans. Image Process., vol. 17, no. 8, 1323-1330, Aug. 2008

Digital Image Stabilization


Digital image processing is an important technique to the consumer video electronics or other video capture devices. The digital image stabilization system is especially an important component used in the application such as security surveillance, military reconnaissance, and digital camera. The task of image sequence stabilization is accomplished by estimating and then compensating the global motion to remove the involuntary image movement caused by, for example unstable hand-shake or vibration. Our research targets to develop the fast and cost-effective image stabilization algorithm on the embedded system such as digital camera.

Selected Publications:
[1]
H. H. Chen, C.-K. Liang, Y.-C. Peng, and H.-A. Chang, ˇ§Integration of digital stabilizer with video codec for digital video cameras,ˇ¨ IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 7, 801-813, Jul. 2007 (2008 IEEE Circuits and Systems Society CSVT Transactions Best Paper Award)

Digital Home Technology


Our goal is to research on multimedia signal processing for home gateway in order to provide digital multimedia access in the home with excellent quality and complete digital right management. The milestone of the first year is to research on content-aware QoS and digital rights management system. The milestone of the second year is to research on complexity-aware streaming and digital rights management on MHP and mobile devices. The milestone of the final year is to research on the error-resilient video streaming for H.264, R-D optimization for complexity-aware encoder, system integration and performance evaluation.

Selected Publications:
[1]
M.-T. Lu, J.-C. Wu, K.-J. Peng, Polly Huang , Jason J. Yao, Homer H. Chen, ˇ§Design and Evaluation of a P2P IPTV System for Heterogeneous Networks , ˇ¨ IEEE Tran. Multimedia, Dec. 2007

Music Emotion


[project page]
[demo]

Music plays an important role in humanˇ¦s history, even more so in the digital age. Never before has such a large collection of music been created and accessed daily by people. As the amount of content continues to explode, the way music information is organized has to evolve in order to meet the ever increasing demand for easy and effective information access. Music classification and retrieval by emotion is a plausible approach, for it is content-centric and functionally powerful.

Selected Publications:
[1]
Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen, ˇ§A Regression approach to music emotion recognition,ˇ¨ IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 2, 448-457, Feb. 2008
[2]
Y.-H. Yang, Y.-C. Lin, H.-T. Cheng, and H. H. Chen, ˇ§Mr. Emo: Music retrieval in the emotion plane,ˇ¨ ACM Multimedia Technical demonstrations, (accepted), Vancouver, Canada, Oct. 2008

Video Codec


Video coding has become a key technology for a wide range of applications, from personal computers to television. It makes it possible and efficient for video storage and transmission. The latest developed video coding standard H.264 has emerged as a popular video coding technology for multimedia communications and consumer electronics as well. Our reseach topics on H.264 encoder optimization includes fast motion estimation and mode decision, rate control, and rate-distortion optimization.

Selected Publications:
[1]
C.-C. Su, J. J. Yao, P. Huang, and H. H. Chen, ˇ§H.264/AVC-Based Multiple Description Video Coding Using Dynamic Slice Groups,ˇ¨ Signal Processing: Image Communication, (accepted), 2008
[2]
C.-C. Su, J. J. Yao, and H. H. Chen, ˇ§H.264/AVC-based multiple description coding scheme,ˇ¨ IEEE Int. Conf. Image Proc., 265-268, San Antonio, TX, Sept. 2007
[3]
M.-L. Wong, Y.-L. Lin, and H. H. Chen, ˇ§A hardware-oriented intra prediction scheme for high definition AVS encoder,ˇ¨ Picture Coding Symp., Lisbon, Portugal, Nov. 2007

Perceptual-Based Video Coding


[demo]

The rate-distortion optimization (RDO) framework for video coding achieves a tradeoff between bit-rate and quality. However, objective distortion metrics such as mean squared error traditionally used in this framework are poorly correlated with perceptual quality. We address this issue by proposing an approach that incorporates the structural similarity index as a quality metric into the framework. In particular, we develop a predictive Lagrange multiplier estimation method to resolve the chicken and egg dilemma of perceptual-based RDO and apply it to H.264 intra and inter mode decision. Given a perceptual quality level, the resulting video encoder achieves on the average 9% bit-rate reduction for intra-frame coding and 11% for inter-frame coding over the JM reference software. Subjective test further confirms that, at the same bit-rate, the proposed perceptual RDO indeed preserves image details and prevents block artifact better than traditional RDO.

Selected Publications:
[1]
Y.-H. Huang, T.-S. Ou, and H. H. Chen, ˇ§Perceptual-based coding mode decision,ˇ¨ IEEE Int. Symp. Circuits and Systems, 393-396, May 2010
[2]
P.-Y. Su, Y.-H. Huang, T.-S. Ou, and H. H. Chen, ˇ§Predictive Lagrange multiplier selection for perceptual rate-distortion optimization,ˇ¨ Fifth International Workshop on Video Processing and Quality Metrics for Consumer Electronics, Scottsdale, Arizona, Jan. 2010
[3]
T.-S. Ou, Y.-H. Huang, and H. H. Chen, ˇ§A perceptual-based approach to bit allocation for H.264 encoder,ˇ¨ in Proc. of SPIE Vol. 7744 Visual Communicatoin and Image Processing, 77441B, 1-10, Huang Shan, China, Jul. 2010
[4]
H. H. Chen, Y.-H. Huang, P.-Y. Su, and T.-S. Ou, ˇ§Improving video coding quality by perceptual rate-distortion optimization,ˇ¨ IEEE Int. Conf. Multimedia Expo., pp. 1287-1292, Jul. 2010
[5]
P.-Y. Su, Y.-H. Huang, T.-S. Ou, and H. H. Chen, ˇ§Recent progress on prceptual video coding,ˇ¨ in Proc. Visual Comm. Image Process., Nov. 2011

Codec-Friendliness of Perceptual Video Quality Metrics


It is a natural expectation that the field of video coding can benefit considerably from the recent advances in perceptual image/video quality assessment. However, the truth is that there has been only limited progress in the application of perceptual quality metrics as the optimality criteria for video encoder design. Indeed, such design task is extremely challenging, if not impossible, largely due to the complicated mathematical representations of most perceptual image and video quality metrics, which do not fit in well with the computational framework of modern video encoders. We provide, mostly qualitatively, an analysis of the fundamental issues of this mismatch for a number of popular perceptual quality metrics and, from the video coding perspective, suggests a number of ˇ§codec-friendlyˇ¨ guidelines for future development of perceptual quality metrics.

Selected Publications:
[1]
P.-Y. Su, T.-Y. Huang, C.-K. Kao, and H. H. Chen, ˇ§Adopting perceptual quality metrics in video encoders: Progress and critiques,ˇ¨ IEEE Int. Workshop Emerging Multimedia Systems Applications, Melbourne, Jul. 2012

Rate-Distortion Optimized Quantization for Video Encoder


[Program]

Rate-distortion optimized quantization improves the coding performance of video compression. However, the search process involved in most existing methods is computationally expensive. We develop a method for accelerating the rate-distortion optimized quantization process. The acceleration is achieved by using a rate model of entropy coding to directly solve the rate-distortion optimization problem. Compared with the H.264/AVC reference encoder, our method achieves an average 5% bitrate reduction for IBBP GOP structure and 1% bitrate reduction for IPPP GOP structure, with negligible computational overhead. Compared with existing methods, our method is significantly more efficient. We believe the efficiency gain justifies the performance tradeoff for many real-world video coding systems, particularly in low-complexity applications.

Selected Publications:
[1]
T.-Y. Huang, P.-Y. Su, C.-K. Kao, and H. H. Chen, ˇ§Quality improvement of video codec by rate-distortion optimized quantization,ˇ¨ IEEE Workshop on Multimedia Quality of Experience, Dana Point, CA, Dec. 2011

Visual Attention Modeling


Visual attention is an important characteristic of human visual system, useful for image processing and compression. We develop a computational scheme that adopts both low-level and high-level features to predict visual attention from video signal. The adoption of low-level features (color, orientation, and motion) is based on the study of visual cells, whereas the adoption of human face as a high-level feature is based on the study of media communications. The low-level and high-level features are then fused by using machine learning. We show that such a scheme is more robust than those using purely single low- or high-level features. It is able to learn the relationship between features and visual attention to avoid perceptual mismatch.