Web Video Codecs Comparison

With the lack of an universal standard, there are many video formats competing in the Web, with several compression ratios, levels of decoding complexity, and licenses. Content is compressed to such formats and then decoded to raw video by means of codecs (encoders/decoders). While the format defines the models, tools and techniques, a codec is an implementation of these, and may as such may be well or poorly made. Fortunately, there are many high quality open source codecs for Web formats, and these are among the best implementations. In general, the higher the compression ratio, the more complex is the format, requiring more decoding power, and affecting device support, so format selection is not as simple as simply using the highest compression, and it is important to understand the tradeoffs.

Image and video compression also take advantage of the fact that the human visual system (HVS) does not uniformly perceive loss of detail. Several visual features such as luminance, contrast, structure and context affect perceived distortion from lossy compression, typical of Web streaming and general broadcast. For this reason, image and video quality assessment (IQA/VQA) is not a trivial endeavor, either. While PSNR is a simple and physically obvious VQA metric, it completely disregards psychophysical features. SSIM is a popular, effective, and efficient perceptual IQA metric; Fast SSIM is a more efficient, though less effective, version. GMSD is an even more effective and efficient version of SSIM. Finally, SG-Sim is a new VQA metric based on Fast SSIM that attempts to improve upon the effectiveness and efficiency of GMSD.

Codec comparisons are a popular exercise, typically using a handful of VQA metrics. In the present exercise, we not only evaluate new and old Web video codecs, but also the new VQA metrics GMSD and SG-Sim, which are not yet in ample employment for this task.

Experimental Setup

In this comparison, the best open source encoders for Web formats are compared at their top quality settings, with any Web-appropriate restrictions for general devices. Old encoders are typically tuned to maximize PSNR, recent encoders maximize SSIM, and the "x26X family" offers specialized psychovisual optimizations. The reference video has 640×360 resolution and is encoded in 2 passes for 400, 800 and 1200 kbit/s (0.072, 0.145 and 0.217 bits per pixel — bpp), with maximum rate set to double the average and a buffer of 1.5 second. Keyframe interval is set at 4 seconds, common for adaptive streaming over HTTP. Evaluation is performed by the jVQA tool.

Theora HQ: --optimize. V. 1.2.0alpha.
XviD HQ: -quality 6 -vhqmode 4 -bvhq -qpel -nopacked -masking -max_bframes 2. V. 1.3.4.
VP8 HQ: --good --cpu-used 0 --profile 0 --auto-alt-ref 1 --tune ssim. V. 1.5.0.
x264 Baseline HQ Psy-RDO: --profile baseline --min-keyint 1 --preset veryslow --ref 4 --tune grain. V. 2665.
x264 Baseline HQ SSIM-RDO: --profile baseline --min-keyint 1 --preset veryslow --ref 4 --tune ssim. V. 2665.
x264 High HQ Psy-RDO: --profile high --min-keyint 1 --preset veryslow --ref 4 --bframes 2 --b-pyramid none --tune grain. V. 2665.
x264 High HQ SSIM-RDO: --profile high --min-keyint 1 --preset veryslow --ref 4 --bframes 2 --b-pyramid none --tune ssim. V. 2665.
VP9 HQ: --good --cpu-used 1 --profile 0 --auto-alt-ref 1 --speed 1. (--tune ssim provoked encoder crash.) V. 1.5.0.
x265 SSIM-RDO: --preset medium --tune ssim --min-keyint 1. V. 1.9+54.
x265 SSIM-RDO HQ: --preset veryslow --tune ssim --min-keyint 1. V. 1.9+54.

Results

Sample: