Python Multimedia

5 (1 reviews total)
By Ninad Sathaye
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Python and Multimedia

About this book

Multimedia applications are used by a range of industries to enhance the visual appeal of a product. This book will teach the reader how to perform multimedia processing using Python.

This step-by-step guide gives you hands-on experience for developing exciting multimedia applications using Python. This book will help you to build applications for processing images, creating 2D animations and processing audio and video.

Writing applications that work with images, videos, and other sensory effects is great. Not every application gets to make full use of audio/visual effects, but a certain amount of multimedia makes any application a lot more appealing. There are numerous multimedia libraries for which Python bindings are available. These libraries enable working with different kinds of media, such as images, audio, video, games, and so on. This book introduces the reader to the most widely used open source libraries through several exciting, real world projects. Popular multimedia frameworks and libraries such as GStreamer,Pyglet, QT Phonon, and Python Imaging library are used to develop various multimedia applications.

Publication date:
August 2010


Chapter 1. Python and Multimedia

Since its conception in 1989, Python has gained increasing popularity as a general purpose programming language. It is a high-level, object-oriented language with a comprehensive standard library. The language features such as automatic memory management and easy readability have attracted the attention of a wide range of developer communities. Typically, one can develop complex applications in Python very quickly compared to some other languages. It is used in several open source as well as commercial scientific modeling and visualization software packages. It has already gained popularity in industries such as animation and game development studios, where the focus is on multimedia application development. This book is all about multimedia processing using Python.

In this introductory chapter, we shall:

  • Learn about multimedia and multimedia processing

  • Discuss a few popular multimedia frameworks for multimedia processing using Python

  • Develop a simple interactive application using PyGame

So let's get on with it.



We use multimedia applications in our everyday lives. It is multimedia that we deal with while watching a movie or listening to a song or playing a video game. Multimedia applications are used in a broad spectrum of fields. Multimedia has a crucial role to play in the advertising and entertainment industry. One of the most common usages is to add audio and video effects to a movie. Educational software packages such as a flight or a drive simulator use multimedia to teach various topics in an interactive way.

So what really is multimedia? In general, any application that makes use of different sources of digital media is termed as a digital multimedia. A video, for instance, is a combination of different sources or contents. The contents can be an audio track, a video track, and a subtitle track. When such video is played, all these media sources are presented together to accomplish the desired effect.

A multichannel audio can have a background music track and a lyrics track. It may even include various audio effects. An animation can be created by using a bunch of digital images that are displayed quickly one after the other. These are different examples of multimedia.

In the case of computer or video games, another dimension is added to the application, the user interaction. It is often termed as an interactive type of multimedia. Here, the users determine the way the multimedia contents are presented. With the help of devices such as keyboard, mouse, trackball, joystick, and so on, the users can interactively control the game.


Multimedia processing

We discussed some of the application domains where multimedia is extensively used. The focus of this book will be on multimedia processing, using which various multimedia applications will be developed.

Image processing

After taking a snap with a digital camera, we often tweak the original digital image for various reasons. One of the most common reasons is to remove blemishes from the image, such as removing 'red-eye' or increasing the brightness level if the picture was taken in insufficient light, and so on. Another reason for doing so is to add special effects that give a pleasing appearance to the image. For example, making a family picture black and white and digitally adding a frame around the picture gives it a nostalgic effect. The next illustration shows an image before and after the enhancement. Sometimes, the original image is modified just to make you understand important information presented by the image. Suppose the picture represents a complicated assembly of components. One can add special effects to the image so that only edges in the picture are shown as highlighted. This information can then be used to detect, for instance, interference between the components. Thus, we digitally process the image further until we get the desired output image.

An example where a border is added around an image to change its appearance is as follows:

Digital image processing can be viewed as an application of various algorithms/filters on the image data. One of the examples is an image smoothing filter. Image smoothing means reducing the noise from the image. The random changes in brightness and color levels within the image data are typically referred to as image noise. The smoothing algorithms modify the input image data so that this noise is reduced in the resultant image.

Another commonly performed image processing operation is blending. As the name suggests, blending means mixing two compatible images to create a new image. Typically, the data of the two input images is interpolated using a constant value of alpha to produce a final image. The next illustration shows the two input images and the resultant image after blending. In the coming chapters we will learn several of such digital image processing techniques.

The pictures of the bridge and the flying birds are taken at different locations. Using image processing techniques these two images can be blended together so that they appear as a single picture:

Audio and video processing

When you are listening to music on your computer, your music player is doing several things in the background. It processes the digital media data so that it can be transformed into a playable format that an output media device, such as an audio speaker, requires. The media data flows through a number of interconnected media handling components, before it reaches a media output device or a media file to which it is written. This is shown in the next illustration.

The following image shows a media data processing pipeline:

Audio and video processing encompasses a number of things. Some of them are briefly discussed in this section. In this book, we will learn various audio-video processing techniques using Python bindings of the GStreamer multimedia framework.


If you record footage on your camcorder and then transfer it to your computer, it will take up a lot of space. In order to save those moments on a VCD or a DVD, you almost always have to compress the audio-video data so that it occupies less space. There are two types of audio and video compression; lossy and lossless. The lossy compression is very common. Here, some data is assumed unnecessary and is not retained in the compressed media. For example, in a lossy video compression, even if some of the original data is lost, it has much less impact on the overall quality of the video. On the other hand, in lossless compression, the data of a compressed audio or video perfectly matches the original data. The compression ratio, however, is very low. As we go along, we will write audio-video data conversion utilities to compress the media data.


Mixing is a way to create composite media using more than one media source. In case of audio mixing, the audio data from different sources is combined into one or more audio channels. For example, it can be used to add audio effect, in order to synchronize separate music and lyrics tracks. In the coming chapters, we will learn more about the media mixing techniques used with Python.


Media mixing can be viewed as a type of media editing. Media editing can be broadly divided into linear editing and non-linear editing. In linear editing, the programmer doesn't control the way media is presented. Whereas in non-linear editing, editing is done interactively. This book will cover the basics of media editing. For example, we will learn how to create a new audio track by combining portions of different audio files.


An animation can be viewed as an optical illusion of motion created by displaying a sequence of image frames one after the other. Each of these image frames is slightly different from the previously displayed one. The next illustration shows animation frames of a 'grandfather's clock':

As you can see, there are four image frames in a clock animation. These frames are quickly displayed one after the other to achieve the desired animation effect. Each of these images will be shown for 0.25 seconds. Therefore, it simulates the pendulum oscillation of one second.

Cartoon animation is a classic example of animation. Since its debut in the early twentieth century, animation has become a prominent entertainment industry. Our focus in this book will be on 2D cartoon animations built using Python. In Chapter 4, we will learn some techniques to build such animations. Creating a cartoon character and bringing it to 'life' is a laborious job. Until the late 70s, most of the animations and effects were created without the use of computers. In today's age, much of the image creation work is produced digitally. The state-of-the-art technology makes this process much faster. For example, one can apply image transformations to display or move a portion of an image, thereby avoiding the need to create the whole cartoon image for the next frame.


Built-in multimedia support

Python has a few built-in multimedia modules for application development. We will skim through some of these modules.


The winsound module is available on the Windows platform. It provides an interface which can be used to implement fundamental audio-playing elements in the application. A sound can be played by calling PlaySound(sound, flags). Here, the argument sound is used to specify the path of an audio file. If this parameter is specified as None, the presently streaming audio (if any) is stopped. The second argument specifies whether the file to be played is a sound file or a system sound. The following code snippet shows how to play a wave formatted audio file using winsound module.

from winsound import PlaySound, SND_FILENAME

PlaySound("C:/AudioFiles/my_music.wav", SND_FILENAME )

This plays the sound file specified by the first argument to the function PlaySound. The second argument, SND_FILENAME, says that the first argument is an audio file. If the flag is set as SND_ALIAS, it means the value for the first argument is a system sound from the registry.


This module is used for manipulating the raw audio data. One can perform several useful operations on sound fragments. For example, it can find the minimum and maximum values of all the samples within a sound fragment.


The wave module provides an interface to read and write audio files with WAV file format. The following line of code opens a wav file.

import wave
fil ='horn.wav', 'r')

The first argument of method open is the location where the path to the wave file is specified. The second argument 'r' returns a Wave_read object. This is the mode in which the audio file is opened, 'r' or 'rb' for read-only mode and 'w' or 'wb' for write-only mode.


External multimedia libraries and frameworks

There are several open source multimedia frameworks available for multimedia application development. The Python bindings for most of these are readily available. We will discuss a few of the most popular multimedia frameworks here. In the chapters that follow, we will make use of many of these libraries to create some useful multimedia applications.

Python Imaging Library

Python Imaging Library provides image processing functionality in Python. It supports several image formats. Later in this book, a number of image processing techniques using PIL will be discussed thoroughly. We will learn things such as image format conversion and various image manipulation and enhancement techniques using the Python Imaging Library.


PyMedia is a popular open source media library that supports audio/video manipulation of a wide range of multimedia formats.


This framework enables multimedia manipulation. It is a framework on top of which one can develop multimedia applications. The rich set of libraries it provides makes it easier to develop applications with complex audio/video processing capabilities. GStreamer is written in C programming language and provides bindings for some other programming languages including Python. Several open source projects use GStreamer framework to develop their own multimedia application. Comprehensive documentation is available on the GStreamer project website. GStreamer Application Development Manual is a very good starting point. This framework will be extensively used later in this group to develop audio and video applications.


Interested in animations and gaming applications? Pyglet is here to help. Pyglet provides an API for developing multimedia applications using Python. It is an OpenGL-based library that works on multiple platforms. It is one of the popular multimedia frameworks for development of games and other graphically intense applications. It supports multiple monitor configuration typically needed for gaming application development. Later in this book, we will be extensively using this Pyglet framework for creating animations.


PyGame ( is another very popular open source framework that provides an API for gaming application development needs. It provides a rich set of graphics and sound libraries. We won't be using PyGame in this book. But since it is a prominent multimedia framework, we will briefly discuss some of its most important modules and work out a simple example. The PyGame website provides ample resources on use of this framework for animation and game programming.


The Sprite module contains several classes; out of these, Sprite and Group are the most important. Sprite is the super class of all the visible game objects. A Group object is a container for several instances of Sprite.


As the name suggests, the Display module has functionality dealing with the display. It is used to create a Surface instance for displaying the Pygame window. Some of the important methods of this module include flip and update. The former is called to make sure that everything drawn is properly displayed on the screen. Whereas the latter is used if you just want to update a portion of the screen.


This module is used to display an image. The instance of Surface represents an image. The following line of code creates such an instance.

surf = pygame.display.set_mode((800,600))

The API method, display.set_mode, is used to create this instance. The width and height of the window are specified as arguments to this method.


With the Draw module, one can render several basic shapes within the Surface. Examples include circles, rectangles, lines, and so on.


This is another important module of PyGame. An event is said to occur when, for instance, the user clicks a mouse button or presses a key and so on. The event information is used to instruct the program to execute in a certain way.


The Image module is used to process images with different file formats. The loaded image is represented by a surface.

Music provides convenient methods for controlling playback such as play, reverse, stop, and so on.

The following is a simple program that highlights some of the fundamental concepts of animation and game programming. It shows how to display objects in an application window and then interactively modify their positions. We will use PyGame to accomplish this task. Later in this book, we will use a different multimedia framework, Pyglet, for creating animations.


Time for action – a simple application using PyGame

This example will make use of the modules we just discussed. For this application to work, you will need to install PyGame. The binary and source distribution of PyGame is available on Pygame's website.

  1. Create a new Python source file and write the following code in it.

    1  import pygame
    2  import sys
    4  pygame.init()
    5  bgcolor = (200, 200, 100)
    6  surf = pygame.display.set_mode((400,400))
    8  circle_color = (0, 255, 255)
    9  x, y = 200, 300
    10 circle_rad = 50
    12 pygame.display.set_caption("My Pygame Window")
    14 while True:
    15     for event in pygame.event.get():
    16         if event.type == pygame.QUIT:
    17             sys.exit()        
    18         elif event.type == pygame.KEYDOWN:            
    19             if event.key == pygame.K_UP:
    20                 y -= 10
    21             elif event.key == pygame.K_DOWN:
    22                 y += 10
    23             elif event.key == pygame.K_RIGHT:
    24                 x += 10
    25             elif event.key == pygame.K_LEFT:
    26                 x -= 10
    28     circle_pos = (x, y)            
    30     surf.fill(bgcolor)
    31, circle_color , 
    32                        circle_pos , circle_rad)
    33     pygame.display.flip()
  2. The first line imports the pygame package. On line 4, the modules within this pygame package are initialized. An instance of class Surface is created using display.set_mode method. This is the main PyGame window inside which the images will be drawn. To ensure that this window is constantly displayed on the screen, we need to add a while loop that will run forever, until the window is closed by the user. In this simple application everything we need is placed inside the while loop. The background color of the PyGame window represented by object surf is set on line 30.

  3. A circular shape is drawn in the PyGame surface by the code on line 31. The arguments to are (Surface, color, position, radius) . This creates a circle at the position specified by the argument circle_pos. The instance of class Surface is sent as the first argument to this method.

  4. The code block 16-26 captures certain events. An event occurs when, for instance, a mouse button or a key is pressed. In this example, we instruct the program to do certain things when the arrow keys are pressed. When the RIGHT arrow key is pressed, the circle is drawn with the x coordinate offset by 10 pixels to the previous position. As a result, the circle appears to be moving towards right whenever you press the RIGHT arrow key. When the PyGame window is closed, the pygame.QUIT event occurs. Here, we simply exit the application by calling sys.exit() as done on line 17.

  5. Finally, we need to ensure that everything drawn on the Surface is visible. This is accomplished by the code on line 31. If you disable this line, incompletely drawn images may appear on the screen.

  6. Execute the program from a terminal window. It will show a new graphics window containing a circular shape. If you press the arrow keys on the keyboard, the circle will move in the direction indicated by the arrow key. The next illustration shows the screenshot of the original circle position (left) and when it is moved using the UP and RIGHT arrow keys.

    A simple PyGame application with a circle drawn within the Surface (window). The image on the right side is a screenshot taken after maneuvering the position of the circle with the help of arrow keys:

What just happened?

We used PyGame to create a simple user interactive application. The purpose of this example was to introduce some of the basic concepts behind animation and game programming. It was just a preview of what is coming next! Later in this book we will use Pyglet framework to create some interesting 2D animations.

QT Phonon

When one thinks of a media player, it is almost always associated with a graphical user interface. Of course one can work with command-line multimedia players. But a media player with a GUI is a clear winner as it provides an easy to use, intuitive user interface to stream a media and control its playback. The next screenshot shows the user interface of an audio player developed using QT Phonon.

An Audio Player application developed with QT Phonon:

QT is an open source GUI framework. 'Phonon' is a multimedia package within QT that supports audio and video playback. Note that, Phonon is meant for simple media player functionality. For complex audio/video player functionality, you should use multimedia frameworks like GStreamer. Phonon depends on a platform-specific backend for media processing. For example, on Windows platform the backend framework is DirectShow. The supported functionality may vary depending on the platform.

To develop a media processing application, a media graph is created in Phonon. This media graph contains various interlinked media nodes. Each media node does a portion of media processing. For example, an effects node will add an audio effect, such as echo to the media. Another node will be responsible for outputting the media from an audio or video device and so on. In chapter 8, we will develop audio and video player applications using Phonon framework. The next illustration shows a video player streaming a video. It is developed using QT Phonon. We will be developing this application in Chapter 8.

Using various built-in modules of QT Phonon, it is very easy to create GUI-based audio and video players. This example shows a video player in action:

Other multimedia libraries

Python bindings for several other multimedia libraries are available on various platforms. Some of the popular libraries are mentioned below.

Snack Sound Toolkit

Snack is an audio toolkit that is used to create cross-platform audio applications. It includes audio analysis and input-output functionality and it has support for audio visualization as well. The official website for Snack Sound Toolkit is


PyAudiere ( is an open source audio library. It provides an API to easily implement the audio functionality in various applications. It is based on Audiere Sound Library.



This chapter served as an introduction to multimedia processing using Python.

Specifically, in this chapter we covered:

  • An overview of multimedia processing. It introduced us to digital image, audio, and video processing.

  • We learned about a number of freely available multimedia frameworks that can be used for multimedia processing.

Now that we know what multimedia libraries and frameworks are out there, we're ready to explore these to develop exciting multimedia applications!

About the Author

  • Ninad Sathaye

    Ninad Sathaye has spent several years of his professional career designing and developing performance-critical engineering applications written in a variety of languages, including Python and C++. He has worked as a software architect in the semiconductor industry, and more recently in the domain of Internet of Things. He holds a master's degree in mechanical engineering.

    Browse publications by this author

Latest Reviews

(1 reviews total)
Python Multimedia
Unlock this book and the full library for FREE
Start free trial