Tuesday, October 3, 2017

VP9 ELI5 Part 1

Welcome to my VP9 Explain Like I'm 5 series. I'm going to explain the VP9 codec so anyone can understand it and perhaps even create their own encoder.

What is VP9 and why should I care?

First things first, let's establish what VP9 is. It's a video codec created by Google and it's what almost all YouTube videos are saved as on the server. Previously, YouTube videos were all MP4 (specifically, H.264). There were two problems with that. The first is that it's patented and copyrighted, so you have to pay the MPEG group. Second, it takes a lot of data for any given quality level. Mobile consumers aren't the only ones who pay by the gigabyte for their data usage; businesses have to pay for that as well to serve people. Normally video has a tradeoff: you can have higher quality for more data usage, or less data usage with bad quality. VP9, amazingly enough, allows Google to pack higher quality into less data, achieving both benefits at once. Unfortunately, only Google and Netflix use VP9; everyone else still uses MP4.

Video definition

You probably already know it's a series of still pictures called frames. If you took pictures really fast with your still picture camera, you could play them quickly to see a crude video. Video is merely a set of still pictures played one after the other really fast.

What are these pictures composed of?

You may be thinking, "A picture is simply dots (pixels). Each one has an RGB (red, green, blue) color value. I know because I've done pixel art in MS Paint." Well, I've got to tell you, that's not how images work in a video.

To explain this, I'll need to give a brief history lesson. You see, video frames could easily use RGB colors like you might have assumed. However, the US TV system was defined near the beginning of World War II and only carried black-and-white video until the 1950's. So instead of red, green, and blue, the old TV system only carried brightness values (shades of gray). You can approximate that in MS Paint by using the same number for red, green, and blue.

In the 50's, they wanted to add color to TV without breaking compatibility with existing TV's. They weren't willing to go the 2009 route and make everyone buy a new TV. What they did was mix a small amount of color data onto the existing black-and-white signal. That way, old TV's would still decode it while color TV's would add the color info on top of the black-and-white picture.

The FCC had allocated 6 MHz bandwidth for TV channels. The engineers couldn't add too much extra color without the signal bleeding onto other channels, so they devised a system of saving only a low-res copy of the image in color format while saving the full-res image in B/W format. The low-quality color image was enlarged and super-imposed over the B/W image, making good color because the human eye sees more brightness than color.

Even though NTSC is nearly dead, this system lives on in computers as YUV 4:2:0. If you want to read more, check out the YUV Wikipedia page.

Let's get started

For this series, we'll be working with the following 960x540 color photo. In case you didn't know, that's 1/4 the size of 1920x1080.

Credit: Lt. Zachary West, 100th MPAD, Flickr, CC BY 2.0

This wasn't actually 960x540 but I cropped and resized so it would be a nice easy size.

If you want to follow along, you can download the image.

No comments:

Post a Comment