Since its conception in 1989, Python has gained increasing popularity as a general purpose programming language. It is a high-level, object-oriented language with a comprehensive standard library. The language features such as automatic memory management and easy readability have attracted the attention of a wide range of developer communities. Typically, one can develop complex applications in Python very quickly compared to some other languages. It is used in several open source as well as commercial scientific modeling and visualization software packages. It has already gained popularity in industries such as animation and game development studios, where the focus is on multimedia application development. This book is all about multimedia processing using Python.
In this introductory chapter, we shall:
Learn about multimedia and multimedia processing
Discuss a few popular multimedia frameworks for multimedia processing using Python
Develop a simple interactive application using PyGame
So let's get on with it.
We use multimedia applications in our everyday lives. It is multimedia that we deal with while watching a movie or listening to a song or playing a video game. Multimedia applications are used in a broad spectrum of fields. Multimedia has a crucial role to play in the advertising and entertainment industry. One of the most common usages is to add audio and video effects to a movie. Educational software packages such as a flight or a drive simulator use multimedia to teach various topics in an interactive way.
So what really is multimedia? In general, any application that makes use of different sources of digital media is termed as a digital multimedia. A video, for instance, is a combination of different sources or contents. The contents can be an audio track, a video track, and a subtitle track. When such video is played, all these media sources are presented together to accomplish the desired effect.
A multichannel audio can have a background music track and a lyrics track. It may even include various audio effects. An animation can be created by using a bunch of digital images that are displayed quickly one after the other. These are different examples of multimedia.
In the case of computer or video games, another dimension is added to the application, the user interaction. It is often termed as an interactive type of multimedia. Here, the users determine the way the multimedia contents are presented. With the help of devices such as keyboard, mouse, trackball, joystick, and so on, the users can interactively control the game.
We discussed some of the application domains where multimedia is extensively used. The focus of this book will be on multimedia processing, using which various multimedia applications will be developed.
After taking a snap with a digital camera, we often tweak the original digital image for various reasons. One of the most common reasons is to remove blemishes from the image, such as removing 'red-eye' or increasing the brightness level if the picture was taken in insufficient light, and so on. Another reason for doing so is to add special effects that give a pleasing appearance to the image. For example, making a family picture black and white and digitally adding a frame around the picture gives it a nostalgic effect. The next illustration shows an image before and after the enhancement. Sometimes, the original image is modified just to make you understand important information presented by the image. Suppose the picture represents a complicated assembly of components. One can add special effects to the image so that only edges in the picture are shown as highlighted. This information can then be used to detect, for instance, interference between the components. Thus, we digitally process the image further until we get the desired output image.
An example where a border is added around an image to change its appearance is as follows:

Digital image processing can be viewed as an application of various algorithms/filters on the image data. One of the examples is an image smoothing filter. Image smoothing means reducing the noise from the image. The random changes in brightness and color levels within the image data are typically referred to as image noise. The smoothing algorithms modify the input image data so that this noise is reduced in the resultant image.
Another commonly performed image processing operation is blending. As the name suggests, blending means mixing two compatible images to create a new image. Typically, the data of the two input images is interpolated using a constant value of alpha to produce a final image. The next illustration shows the two input images and the resultant image after blending. In the coming chapters we will learn several of such digital image processing techniques.
The pictures of the bridge and the flying birds are taken at different locations. Using image processing techniques these two images can be blended together so that they appear as a single picture:

When you are listening to music on your computer, your music player is doing several things in the background. It processes the digital media data so that it can be transformed into a playable format that an output media device, such as an audio speaker, requires. The media data flows through a number of interconnected media handling components, before it reaches a media output device or a media file to which it is written. This is shown in the next illustration.
The following image shows a media data processing pipeline:

Audio and video processing encompasses a number of things. Some of them are briefly discussed in this section. In this book, we will learn various audio-video processing techniques using Python bindings of the GStreamer multimedia framework.
If you record footage on your camcorder and then transfer it to your computer, it will take up a lot of space. In order to save those moments on a VCD or a DVD, you almost always have to compress the audio-video data so that it occupies less space. There are two types of audio and video compression; lossy and lossless. The lossy compression is very common. Here, some data is assumed unnecessary and is not retained in the compressed media. For example, in a lossy video compression, even if some of the original data is lost, it has much less impact on the overall quality of the video. On the other hand, in lossless compression, the data of a compressed audio or video perfectly matches the original data. The compression ratio, however, is very low. As we go along, we will write audio-video data conversion utilities to compress the media data.
Mixing is a way to create composite media using more than one media source. In case of audio mixing, the audio data from different sources is combined into one or more audio channels. For example, it can be used to add audio effect, in order to synchronize separate music and lyrics tracks. In the coming chapters, we will learn more about the media mixing techniques used with Python.
Media mixing can be viewed as a type of media editing. Media editing can be broadly divided into linear editing and non-linear editing. In linear editing, the programmer doesn't control the way media is presented. Whereas in non-linear editing, editing is done interactively. This book will cover the basics of media editing. For example, we will learn how to create a new audio track by combining portions of different audio files.
An animation can be viewed as an optical illusion of motion created by displaying a sequence of image frames one after the other. Each of these image frames is slightly different from the previously displayed one. The next illustration shows animation frames of a 'grandfather's clock':

As you can see, there are four image frames in a clock animation. These frames are quickly displayed one after the other to achieve the desired animation effect. Each of these images will be shown for 0.25 seconds. Therefore, it simulates the pendulum oscillation of one second.
Cartoon animation is a classic example of animation. Since its debut in the early twentieth century, animation has become a prominent entertainment industry. Our focus in this book will be on 2D cartoon animations built using Python. In Chapter 4, we will learn some techniques to build such animations. Creating a cartoon character and bringing it to 'life' is a laborious job. Until the late 70s, most of the animations and effects were created without the use of computers. In today's age, much of the image creation work is produced digitally. The state-of-the-art technology makes this process much faster. For example, one can apply image transformations to display or move a portion of an image, thereby avoiding the need to create the whole cartoon image for the next frame.
Python has a few built-in multimedia modules for application development. We will skim through some of these modules.
The winsound
module is available on the Windows platform. It provides an interface which can be used to implement fundamental audio-playing elements in the application. A sound can be played by calling PlaySound(sound,
flags)
. Here, the argument sound is used to specify the path of an audio file. If this parameter is specified as None
, the presently streaming audio (if any) is stopped. The second argument specifies whether the file to be played is a sound file or a system sound. The following code snippet shows how to play a wave formatted audio file using winsound
module.
from winsound import PlaySound, SND_FILENAME PlaySound("C:/AudioFiles/my_music.wav", SND_FILENAME )
This plays the sound file specified by the first argument to the function PlaySound
. The second argument, SND_FILENAME
, says that the first argument is an audio file. If the flag is set as SND_ALIAS
, it means the value for the first argument is a system sound from the registry.
This module is used for manipulating the raw audio data. One can perform several useful operations on sound fragments. For example, it can find the minimum and maximum values of all the samples within a sound fragment.
The wave
module provides an interface to read and write audio files with WAV
file format. The following line of code opens a wav file.
import wave fil = wave.open('horn.wav', 'r')
The first argument of method open
is the location where the path to the wave file is specified. The second argument 'r' returns a Wave_read
object. This is the mode in which the audio file is opened, 'r
' or 'rb
' for read-only mode and 'w
' or 'wb
' for write-only mode.
There are several open source multimedia frameworks available for multimedia application development. The Python bindings for most of these are readily available. We will discuss a few of the most popular multimedia frameworks here. In the chapters that follow, we will make use of many of these libraries to create some useful multimedia applications.
Python Imaging Library provides image processing functionality in Python. It supports several image formats. Later in this book, a number of image processing techniques using PIL will be discussed thoroughly. We will learn things such as image format conversion and various image manipulation and enhancement techniques using the Python Imaging Library.
PyMedia is a popular open source media library that supports audio/video manipulation of a wide range of multimedia formats.
This framework enables multimedia manipulation. It is a framework on top of which one can develop multimedia applications. The rich set of libraries it provides makes it easier to develop applications with complex audio/video processing capabilities. GStreamer is written in C programming language and provides bindings for some other programming languages including Python. Several open source projects use GStreamer framework to develop their own multimedia application. Comprehensive documentation is available on the GStreamer project website. GStreamer Application Development Manual is a very good starting point. This framework will be extensively used later in this group to develop audio and video applications.
Interested in animations and gaming applications? Pyglet is here to help. Pyglet provides an API for developing multimedia applications using Python. It is an OpenGL-based library that works on multiple platforms. It is one of the popular multimedia frameworks for development of games and other graphically intense applications. It supports multiple monitor configuration typically needed for gaming application development. Later in this book, we will be extensively using this Pyglet framework for creating animations.
PyGame (www.pygame.org) is another very popular open source framework that provides an API for gaming application development needs. It provides a rich set of graphics and sound libraries. We won't be using PyGame in this book. But since it is a prominent multimedia framework, we will briefly discuss some of its most important modules and work out a simple example. The PyGame website provides ample resources on use of this framework for animation and game programming.
The Sprite
module contains several classes; out of these, Sprite
and Group
are the most important. Sprite
is the super class of all the visible game objects. A Group
object is a container for several instances of Sprite.
As the name suggests, the Display
module has functionality dealing with the display. It is used to create a Surface instance for displaying the Pygame window. Some of the important methods of this module include flip
and update
. The former is called to make sure that everything drawn is properly displayed on the screen. Whereas the latter is used if you just want to update a portion of the screen.
This module is used to display an image. The instance of Surface
represents an image. The following line of code creates such an instance.
surf = pygame.display.set_mode((800,600))
The API method, display.set_mode
, is used to create this instance. The width and height of the window are specified as arguments to this method.
With the Draw
module, one can render several basic shapes within the Surface
. Examples include circles, rectangles, lines, and so on.
This is another important module of PyGame. An event is said to occur when, for instance, the user clicks a mouse button or presses a key and so on. The event information is used to instruct the program to execute in a certain way.
The Image
module is used to process images with different file formats. The loaded image is represented by a surface.
Pygame.mixer.music
provides convenient methods for controlling playback such as play, reverse, stop, and so on.
The following is a simple program that highlights some of the fundamental concepts of animation and game programming. It shows how to display objects in an application window and then interactively modify their positions. We will use PyGame to accomplish this task. Later in this book, we will use a different multimedia framework, Pyglet, for creating animations.
This example will make use of the modules we just discussed. For this application to work, you will need to install PyGame. The binary and source distribution of PyGame is available on Pygame's website.
Create a new Python source file and write the following code in it.
1 import pygame 2 import sys 3 4 pygame.init() 5 bgcolor = (200, 200, 100) 6 surf = pygame.display.set_mode((400,400)) 7 8 circle_color = (0, 255, 255) 9 x, y = 200, 300 10 circle_rad = 50 11 12 pygame.display.set_caption("My Pygame Window") 13 14 while True: 15 for event in pygame.event.get(): 16 if event.type == pygame.QUIT: 17 sys.exit() 18 elif event.type == pygame.KEYDOWN: 19 if event.key == pygame.K_UP: 20 y -= 10 21 elif event.key == pygame.K_DOWN: 22 y += 10 23 elif event.key == pygame.K_RIGHT: 24 x += 10 25 elif event.key == pygame.K_LEFT: 26 x -= 10 27 28 circle_pos = (x, y) 29 30 surf.fill(bgcolor) 31 pygame.draw.circle(surf, circle_color , 32 circle_pos , circle_rad) 33 pygame.display.flip()
The first line imports the
pygame
package. On line 4, the modules within thispygame
package are initialized. An instance of classSurface
is created usingdisplay.set_mode
method. This is the main PyGame window inside which the images will be drawn. To ensure that this window is constantly displayed on the screen, we need to add awhile
loop that will run forever, until the window is closed by the user. In this simple application everything we need is placed inside thewhile
loop. The background color of the PyGame window represented by objectsurf
is set on line 30.A circular shape is drawn in the PyGame surface by the code on line 31. The arguments to
draw.circle
are(Surface,
color,
position,
radius)
. This creates a circle at the position specified by the argumentcircle_pos
. The instance of classSurface
is sent as the first argument to this method.The code block 16-26 captures certain events. An event occurs when, for instance, a mouse button or a key is pressed. In this example, we instruct the program to do certain things when the arrow keys are pressed. When the
RIGHT
arrow key is pressed, the circle is drawn with thex
coordinate offset by 10 pixels to the previous position. As a result, the circle appears to be moving towards right whenever you press theRIGHT
arrow key. When the PyGame window is closed, thepygame.QUIT
event occurs. Here, we simply exit the application by callingsys.exit()
as done on line 17.Finally, we need to ensure that everything drawn on the
Surface
is visible. This is accomplished by the code on line 31. If you disable this line, incompletely drawn images may appear on the screen.Execute the program from a terminal window. It will show a new graphics window containing a circular shape. If you press the arrow keys on the keyboard, the circle will move in the direction indicated by the arrow key. The next illustration shows the screenshot of the original circle position (left) and when it is moved using the
UP
andRIGHT
arrow keys.A simple PyGame application with a circle drawn within the Surface (window). The image on the right side is a screenshot taken after maneuvering the position of the circle with the help of arrow keys:

We used PyGame to create a simple user interactive application. The purpose of this example was to introduce some of the basic concepts behind animation and game programming. It was just a preview of what is coming next! Later in this book we will use Pyglet framework to create some interesting 2D animations.
When one thinks of a media player, it is almost always associated with a graphical user interface. Of course one can work with command-line multimedia players. But a media player with a GUI is a clear winner as it provides an easy to use, intuitive user interface to stream a media and control its playback. The next screenshot shows the user interface of an audio player developed using QT Phonon.
An Audio Player application developed with QT Phonon:

QT is an open source GUI framework. 'Phonon' is a multimedia package within QT that supports audio and video playback. Note that, Phonon is meant for simple media player functionality. For complex audio/video player functionality, you should use multimedia frameworks like GStreamer. Phonon depends on a platform-specific backend for media processing. For example, on Windows platform the backend framework is DirectShow. The supported functionality may vary depending on the platform.
To develop a media processing application, a media graph is created in Phonon. This media graph contains various interlinked media nodes. Each media node does a portion of media processing. For example, an effects node will add an audio effect, such as echo to the media. Another node will be responsible for outputting the media from an audio or video device and so on. In chapter 8, we will develop audio and video player applications using Phonon framework. The next illustration shows a video player streaming a video. It is developed using QT Phonon. We will be developing this application in Chapter 8.
Using various built-in modules of QT Phonon, it is very easy to create GUI-based audio and video players. This example shows a video player in action:

Python bindings for several other multimedia libraries are available on various platforms. Some of the popular libraries are mentioned below.
Snack is an audio toolkit that is used to create cross-platform audio applications. It includes audio analysis and input-output functionality and it has support for audio visualization as well. The official website for Snack Sound Toolkit is http://www.speech.kth.se/snack/.
PyAudiere (http://pyaudiere.org/) is an open source audio library. It provides an API to easily implement the audio functionality in various applications. It is based on Audiere Sound Library.
This chapter served as an introduction to multimedia processing using Python.
Specifically, in this chapter we covered:
An overview of multimedia processing. It introduced us to digital image, audio, and video processing.
We learned about a number of freely available multimedia frameworks that can be used for multimedia processing.
Now that we know what multimedia libraries and frameworks are out there, we're ready to explore these to develop exciting multimedia applications!