Example: Coding based on clustering of pixel values
This topic takes us off the direct path to MPEG, but it illustrates some of the principles of video compression.
If we transmit the pixels of a color picture directly, we may allow (for example) 8 bits per primary color per pixel to do so. A system with this capacity of 24 bits per pixel can transmit any possible sequence of pictures including pictures of completely random noise. However, we usually have no need to transmit frames of random noise, and are instead interested in pictures that have many types of internal correlations.
For this example, if we look at the distribution of pixel values in an image, we often find that the pixel values are clustered in several peaks, each cluster representing the color range of one object in the image. (This is especially true for cartoons, where large flat areas are painted with a single color.) These statistics are not often used for general compression systems for transmission, because they vary greatly from picture to picture, but they do have utility in techniques for reducing the number of bits per primary color used to represent the image. For example, some sort of statistical clustering algorithm typically is used to translate 24-bit-per-pixel (16 Million colors) to 8 bit per pixel (256 color) displays for computers. In this case, the original pixel colors are replaced with approximations based on the 256 most commonly occurring colors, obviously a lossy procedure.
For general compression, where we want to maintain the general capability of 16M colors, we could modify the procedure. Still looking at the clustering of the pixel values, we might do the following:
1)
2)
3)
a) The number of the average cluster color that it is close to,
and
b) Its difference from that average cluster color.
Step 3 is an instance of prediction, in which we predict the pixel to be the same color as one of the average cluster colors, and then transmit the error prediction signal, that is, the difference of the actual value from the predicted value.
So far, we have not reduced the data, and have in fact increased it by the use of side data. Lossless data reduction can be achieved in this case, however, with the use of entropy coding for the difference values generated in step 3 above. The basic idea is to use short data words to represent common difference values, and longer words for less common values. Depending on the distribution of difference values, statistical coding may give about a 2:1 gain over uncoded transmission. Note that this technique is lossless - the exact original pixels can be recovered.