Python Multimedia: Working with Audios

(For more resources on Python, see here.)

So let's get on with it!

Installation prerequisites

Since we are going to use an external multimedia framework, it is necessary to install the necessary to install the packages mentioned in this section.


GStreamer is a popular open source multimedia framework that supports audio/video manipulation of a wide range of multimedia formats. It is written in the C programming language and provides bindings for other programming languages including Python. Several open source projects use GStreamer framework to develop their own multimedia application. Throughout this article, we will make use of the GStreamer framework for audio handling. In order to get this working with Python, we need to install both GStreamer and the Python bindings for GStreamer.

Windows platform

The binary distribution of GStreamer is not provided on the project website Installing it from the source may require considerable effort on the part of Windows users. Fortunately, GStreamer WinBuilds project provides pre-compiled binary distributions. Here is the URL to the project website:

The binary distribution for GStreamer as well as its Python bindings (Python 2.6) are available in the Download area of the website:

You need to install two packages. First, the GStreamer and then the Python bindings to the GStreamer. Download and install the GPL distribution of GStreamer available on the GStreamer WinBuilds project website. The name of the GStreamer executable is GStreamerWinBuild- The version should be 0.10.5 or higher. By default, this installation will create a folder C:\gstreamer on your machine. The bin directory within this folder contains runtime libraries needed while using GStreamer.

Next, install the Python bindings for GStreamer. The binary distribution is available on the same website. Use the executable Pygst- pertaining to Python 2.6. The version should be 0.10.15 or higher.

GStreamer WinBuilds appears to be an independent project. It is based on the OSSBuild developing suite. Visit for more information. It could happen that the GStreamer binary built with Python 2.6 is no longer available on the mentioned website at the time you are reading this book. Therefore, it is advised that you should contact the developer community of OSSBuild. Perhaps they might help you out!

Alternatively, you can build GStreamer from source on the Windows platform, using a Linux-like environment for Windows, such as Cygwin ( Under this environment, you can first install dependent software packages such as Python 2.6, gcc compiler, and others. Download the gst-python- package from the GStreamer website Then extract this package and install it from sources using the Cygwin environment. The INSTALL file within this package will have installation instructions.

Other platforms

Many of the Linux distributions provide GStreamer package. You can search for the appropriate gst-python distribution (for Python 2.6) in the package repository. If such a package is not available, install gst-python from the source as discussed in the earlier the Windows platform section.

If you are a Mac OS X user, visit It has detailed instructions on how to download and install the package Py26-gst-python version 0.10.17 (or higher).

Mac OS X 10.5.x (Leopard) comes with the Python 2.5 distribution. If you are using packages using this default version of Python, GStreamer Python bindings using Python 2.5 are available on the darwinports website:


There is a free multiplatform software utility library called 'GLib'. It provides data structures such as hash maps, linked lists, and so on. It also supports the creation of threads. The 'object system' of GLib is called GObject. Here, we need to install the Python bindings for GObject. The Python bindings are available on the PyGTK website at:

Windows platform

The binary installer is available on the PyGTK website. The complete URL is: Download and install version 2.20 for Python 2.6.

Other platforms

For Linux, the source tarball is available on the PyGTK website. There could even be binary distribution in the package repository of your Linux operating system. The direct link to the Version 2.21 of PyGObject (source tarball) is:

If you are a Mac user and you have Python 2.6 installed, a distribution of PyGObject is available at Install version 2.14 or later.

Summary of installation prerequisites

The following table summarizes the packages needed for this article.

Package Download location Version Windows platform Linux/Unix/OS X platforms
GStreamer 0.10.5 or later Install using binary distribution available on the Gstreamer WinBuild website: Use GStreamerWinBuild- (or later version if available). Linux: Use GStreamer distribution in package repository. Mac OS X: Download and install by following instructions on the website:
Python Bindings for GStreamer 0.10.15 or later for Python 2.6 Use binary provided by GStreamer WinBuild project. See for details pertaining to Python 2.6. Linux: Use gst-python distribution in the package repository. Mac OS X: Use this package (if you are using Python2.6): Linux/Mac: Build and install from the source tarball.
Python bindings for GObject "PyGObject" Source distribution: 2.14 or later for Python 2.6 Use binary package from pygobject-2.20.0.win32-py2.6.exe Linux: Install from source if pygobject is not available in the package repository. Mac: Use this package on darwinports (if you are using Python2.6) See for details.

Testing the installation

Ensure that the GStreamer and its Python bindings are properly installed. It is simple to test this. Just start Python from the command line and type the following:

>>>import pygst

If there is no error, it means the Python bindings are installed properly.

Next, type the following:

>>>import gst

If this import is successful, we are all set to use GStreamer for processing audios and videos!

If import gst fails, it will probably complain that it is unable to work some required DLL/shared object. In this case, check your environment variables and make sure that the PATH variable has the correct path to the gstreamer/bin directory. The following lines of code in a Python interpreter show the typical location of the pygst and gst modules on the Windows platform.

>>> import pygst
>>> pygst
<module 'pygst' from 'C:\Python26\lib\site-packages\pygst.pyc'>
>>> pygst.require('0.10')
>>> import gst
>>> gst
<module 'gst' from 'C:\Python26\lib\site-packages\gst-0.10\gst\__init__.pyc'>

Next, test if PyGObject is successfully installed. Start the Python interpreter and try importing the gobject module.

>>import gobject

If this works, we are all set to proceed!

A primer on GStreamer

In this article, we will be using GStreamer multimedia framework extensively. Before we move on to the topics that teach us various audio processing techniques, a primer on GStreamer is necessary.

So what is GStreamer? It is a framework on top of which one can develop multimedia applications. The rich set of libraries it provides makes it easier to develop applications with complex audio/video processing capabilities. Fundamental components of GStreamer are briefly explained in the coming sub-sections.

Comprehensive documentation is available on the GStreamer project website. GStreamer Application Development Manual is a very good starting point. In this section, we will briefly cover some of the important aspects of GStreamer. For further reading, you are recommended to visit the GStreamer project website:

gst-inspect and gst-launch

We will start by learning the two important GStreamer commands. GStreamer can be run from the command line, by calling gst-launch-0.10.exe (on Windows) or gst-launch-0.10(on other platforms). The following command shows a typical execution of GStreamer on Linux. We will see what a pipeline means in the next sub-section.

$gst-launch-0.10 pipeline_description

GStreamer has a plugin architecture. It supports a huge number of plugins. To see more details about any plugin in your GStreamer installation, use the command gst-inspect-0.10 (gst-inspect-0.10.exe on Windows). We will use this command quite often. Use of this command is illustrated here.

$gst-inspect-0.10 decodebin

Here, decodebin is a plugin. Upon execution of the preceding command, it prints detailed information about the plugin decodebin.

Elements and pipeline

In GStreamer, the data flows in a pipeline. Various elements are connected together forming a pipeline, such that the output of the previous element is the input to the next one.

A pipeline can be logically represented as follows:

Element1 ! Element2 ! Element3 ! Element4 ! Element5

Here, Element1 through to Element5 are the element objects chained together by the symbol !. Each of the elements performs a specific task. One of the element objects performs the task of reading input data such as an audio or a video. Another element decodes the file read by the first element, whereas another element performs the job of converting this data into some other format and saving the output. As stated earlier, linking these element objects in a proper manner creates a pipeline.

The concept of a pipeline is similar to the one used in Unix. Following is a Unix example of a pipeline. Here, the vertical separator | defines the pipe.

$ls -la | more

Here, the ls -la lists all the files in a directory. However, sometimes, this list is too long to be displayed in the shell window. So, adding | more allows a user to navigate the data.

Now let's see a realistic example of running GStreamer from the command prompt.

$ gst-launch-0.10 -v filesrc location=path/to/file.ogg ! decodebin ! audioconvert ! fakesink

For a Windows user, the gst command name would be gst-launch-0.10.exe. The pipeline is constructed by specifying different elements. The !symbol links the adjacent elements, thereby forming the whole pipeline for the data to flow. For Python bindings of GStreamer, the abstract base class for pipeline elements is gst.Element, whereas gst.Pipeline class can be used to created pipeline instance. In a pipeline, the data is sent to a separate thread where it is processed until it reaches the end or a termination signal is sent.


GStreamer is a plugin-based framework. There are several plugins available. A plugin is used to encapsulate the functionality of one or more GStreamer elements. Thus we can have a plugin where multiple elements work together to create the desired output. The plugin itself can then be used as an abstract element in the GStreamer pipeline. An example is decodebin. We will learn about it in the upcoming sections. A comprehensive list of available plugins is available at the GStreamer website In almost all applications to be developed, decodebin plugin will be used. For audio processing, the functionality provided by plugins such as gnonlin, audioecho, monoscope, interleave, and so on will be used.


In GStreamer, a bin is a container that manages the element objects added to it. A bin instance can be created using gst.Bin class. It is inherited from gst.Element and can act as an abstract element representing a bunch of elements within it. A GStreamer plugin decodebin is a good example representing a bin. The decodebin contains decoder elements. It auto-plugs the decoder to create the decoding pipeline.


Each element has some sort of connection points to handle data input and output. GStreamer refers to them as pads. Thus an element object can have one or more "receiver pads" termed as sink pads that accept data from the previous element in the pipeline. Similarly, there are 'source pads' that take the data out of the element as an input to the next element (if any) in the pipeline. The following is a very simple example that shows how source and sink pads are specified.

>gst-launch-0.10.exe fakesrc num-bufferes=1 ! fakesink

The fakesrc is the first element in the pipeline. Therefore, it only has a source pad. It transmits the data to the next linkedelement, that is fakesink which only has a sink pad to accept elements. Note that, in this case, since these are fakesrc and fakesink, just empty buffers are exchanged. A pad is defined by the class gst.Pad. A pad can be attached to an element object using the gst.Element.add_pad() method.

The following is a diagrammatic representation of a GStreamer element with a pad. It illustrates two GStreamer elements within a pipeline, having a single source and sink pad.

Python Multimedia: Working with Audios

Now that we know how the pads operate, let's discuss some of special types of pads. In the example, we assumed that the pads for the element are always 'out there'. However, there are some situations where the element doesn't have the pads available all the time. Such elements request the pads they need at runtime. Such a pad is called a dynamic pad. Another type of pad is called ghost pad. These types are discussed in this section.


Dynamic pads

Some objects such as decodebin do not have pads defined when they are created. Such elements determine the type of pad to be used at the runtime. For example, depending on the media file input being processed, the decodebin will create a pad. This is often referred to as dynamic pad or sometimes the available pad as it is not always available in elements such as decodebin.

Ghost pads

As stated in the Bins section a bin object can act as an abstract element. How is it achieved? For that, the bin uses 'ghost pads' or 'pseudo link pads'. The ghost pads of a bin are used to connect an appropriate element inside it. A ghost pad can be created using gst.GhostPad class.


The element objects send and receive the data by using the pads. The type of media data that the element objects will handle is determined by the caps (a short form for capabilities). It is a structure that describes the media formats supported by the element. The caps are defined by the class gst.Caps.


A bus refers to the object that delivers the message generated by GStreamer. A message is a gst.Message object that informs the application about an event within the pipeline. A message is put on the bus using the gst.Bus.gst_bus_post() method. The following code shows an example usage of the bus.

1 bus = pipeline.get_bus()
2 bus.add_signal_watch()
3 bus.connect("message", message_handler)

The first line in the code creates a gst.Bus instance. Here the pipeline is an instance of gst.PipeLine. On the next line, we add a signal watch so that the bus gives out all the messages posted on that bus. Line 3 connects the signal with a Python method. In this example, the message is the signal string and the method it calls is message_handler.


Playbin is a GStreamer plugin that provides a high-level audio/video player. It can handle a number of things such as automatic detection of the input media file format, auto-determination of decoders, audio visualization and volume control, and so on. The following line of code creates a playbin element.

playbin = gst.element_factory_make("playbin")

It defines a property called uri. The URI (Uniform Resource Identifier) should be an absolute path to a file on your computer or on the Web. According to the GStreamer documentation, Playbin2 is just the latest unstable version but once stable, it will replace the Playbin.

A Playbin2 instance can be created the same way as a Playbin instance.

gst-inspect-0.10 playbin2

With this basic understanding, let us learn about various audio processing techniques using GStreamer and Python.

(For more resources on Python, see here.)

Playing music

Given an audio file, one the first things you will do is to play that audio file, isn't it? In GStreamer, what basic elements do we need to play an audio? The essential elements are listed as follows.

  • The first thing we need is to open an audio file for reading
  • Next, we need a decoder to transform the encoded information
  • Then, there needs to be an element to convert the audio format so that it is in a 'playable' format required by an audio device such as speakers
  • Finally, an element that will enable the actual playback of the audio file

How will you play an audio file using the command-line version of GStreamer? One way to execute it using command line is as follows:

$gstlaunch-0.10 filesrc location=/path/to/audio.mp3 ! decodebin ! audioconvert ! autoaudiosink

The autoaudiosink automatically detects the correct audio device on your computer to play the audio. This was tested on a machine with Windows XP and it worked fine. If there is any error playing an audio, check if the audio device on your computer is working properly. You can also try using element sdlaudiosink that outputs to the sound card via SDLAUDIO. If this doesn't work, and you want to install a plugin for audiosink—here is a partial list of GStreamer plugins:
Mac OS X users can try installing osxaudiosink if the default autoaudiosink doesn't work.

The audio file should start playing with this command unless there are any missing plugins.

Time for action – playing an audio: method 1

There are a number of ways to play an audio using Python and GStreamer. Let's start with a simple one. In this section, we will use a command string, similar to what you would specify using the command-line version of GStreamer. This string will be used to construct a gst.Pipeline instance in a Python program.

So, here we go!

  1. Start by creating an AudioPlayer class in a Python source file. Just define the empty methods illustrated in the following code snippet. We will expand those in the later steps.

    1 import thread
    2 import gobject
    3 import pygst
    4 pygst.require("0.10")
    5 import gst
    7 class AudioPlayer:
    8 def __init__(self):
    9 pass
    10 def constructPipeline(self):
    11 pass
    12 def connectSignals(self):
    13 pass
    14 def play(self):
    15 pass
    16 def message_handler(self):
    17 pass
    19 # Now run the program
    20 player = AudioPlayer()
    21 thread.start_new_thread(, ())
    22 gobject.threads_init()
    23 evt_loop = gobject.MainLoop()

    Lines 1 to 5 in the code import the necessary modules. As discussed in the Installation prerequisites section, the package pygst is imported first. Then we call pygst.require to enable the import of gst module.

  2. Now focus on the code block between lines 19 to 24. It is the main execution code. It enables running the program until the music is played. We will use this or similar code throughout to run our audio application.

    On line 21, the thread module is used to create a new thread for playing the audio. The method is sent on this thread. The second argument of thread.start_new_thread is the list of arguments to be passed to the method play. In this example, we do not support any command-line arguments. Therefore, an empty tuple is passed. Python adds its own thread management functionality on top of the operating system threads. When such a thread makes calls to external functions (such as C functions), it puts the 'Global Interpreter Lock' on other threads until, for instance, the C function returns a value.

    The gobject.threads_init() is an initialization function for facilitating the use of Python threading within the gobject modules. It can enable or disable threading while calling the C functions. We call this before running the main event loop. The main event loop for executing this program is created using gobject on line 23 and this loop is started by the call

  3. Next, fill the AudioPlayer class methods with the code. First, write the constructor of the class.

    1 def __init__(self):
    2 self.constructPipeline()
    3 self.is_playing = False
    4 self.connectSignals()

    The pipeline is constructed by the method call on line 2. The flag self.is_playing is initialized to False. It will be used to determine whether the audio being played has reached the end of the stream. On line 4, a method self.connectSignals is called, to capture the messages posted on a bus. We will discuss both these methods next.

  4. The main driver for playing the sound is the following gst command:

    "filesrc location=C:/AudioFiles/my_music.mp3 "\
    "! decodebin ! audioconvert ! autoaudiosink"

    The preceding string has four elements separated by the symbol !. These elements represent the components we briefly discussed earlier.

  5. The first element filesrc location=C:/AudioFiles/my_music.mp3 defines the source element that loads the audio file from a given location. In this string, just replace the audio file path represented by location with an appropriate file path on your computer. You can also specify a file on a disk drive.

    If the filename contains namespaces, make sure you specify the path within quotes. For example, if the filename is my sound.mp3, specify it as follows: filesrc location =\"C:/AudioFiles/my sound.mp3\"

  6. The next element loads the file. This element is connected to a decodebin. As discussed earlier, the decodebin is a plugin to GStreamer and it inherits gst.Bin. Based on the input audio format, it determines the right type of decoder element to use.

    The third element is audioconvert. It translates the decoded audio data into a format playable by the audio device.

    The final element, autoaudiosink, is a plugin; it automatically detects the audio sink for the audio output.

    We have sufficient information now to create an instance of gst.Pipeline. Write the following method.

    1 def constructPipeline(self):
    2 myPipelineString = \
    3 "filesrc location=C:/AudioFiles/my_music.mp3 "\
    4 "! decodebin ! audioconvert ! autoaudiosink"
    5 self.player = gst.parse_launch(myPipelineString)

    An instance of gst.Pipeline is created on line 5, using the gst.parse_launch method.

  7. Now write the following method of class AudioPlayer.

    1 def connectSignals(self):
    2 # In this case, we only capture the messages
    3 # put on the bus.
    4 bus = self.player.get_bus()
    5 bus.add_signal_watch()
    6 bus.connect("message", self.message_handler)

    On line 4, an instance of gst.Bus is created. In the introductory section on GStreamer, we already learned what the code between lines 4 to 6 does. This bus has the job of delivering the messages posted on it from the streaming threads. The add_signal_watch call makes the bus emit the message signal for each message posted. This signal is used by the method message_handler to take appropriate action.

    Write the following method:

    1 def play(self):
    2 self.is_playing = True
    3 self.player.set_state(gst.STATE_PLAYING)
    4 while self.is_playing:
    5 time.sleep(1)
    6 evt_loop.quit()

    On line 2, we set the state of the gst pipeline to gst.STATE_PLAYING to start the audio streaming. The flag self.is_playing controls the while loop on line 4. This loop ensures that the main event loop is not terminated before the end of the audio stream is reached. Within the loop the call to time.sleep just buys some time for the audio streaming to finish. The value of flag is changed in the method message_handler that watches for the messages from the bus. On line 6, the main event loop is terminated. This gets called when the end of stream message is emitted or when some error occurs while playing the audio.

  8. Next, develop method AudioPlayer.message_handler. This method sets the appropriate flag to terminate the main loop and is also responsible for changing the playing state of the pipeline.

    1 def message_handler(self, bus, message):
    2 # Capture the messages on the bus and
    3 # set the appropriate flag.
    4 msgType = message.type
    5 if msgType == gst.MESSAGE_ERROR:
    6 self.player.set_state(gst.STATE_NULL)
    7 self.is_playing = False
    8 print "\n Unable to play audio. Error: ", \
    9 message.parse_error()
    10 elif msgType == gst.MESSAGE_EOS:
    11 self.player.set_state(gst.STATE_NULL)
    12 self.is_playing = False

    In this method, we only check two things: whether the message on the bus says the streaming audio has reached its end (gst.MESSAGE_EOS) or if any error occurred while playing the audio stream (gst.MESSAGE_ERROR). For both these messages, the state of the gst pipeline is changed from gst.STATE_PLAYING to gst.STATE_NULL. The self.is_playing flag is updated to instruct the program to terminate the main event loop.

    We have defined all the necessary code to play the audio. Save the file as and run the application from the command line as follows:


    This will begin playback of the input audio file. Once it is done playing, the program will be terminated. You can press Ctrl + C on Windows or Linux to interrupt the playing of the audio file. It will terminate the program.

What just happened?

We developed a very simple audio player, which can play an input audio file. The code we wrote covered some of the most important components of GStreamer. These components will be useful throughout this article. The core component of the program was a GStreamer pipeline that had instructions to play the given audio file. Additionally, we learned how to create a thread and then start a gobject event loop to ensure that the audio file is played until the end.

Have a go hero – play audios from a playlist

The simple audio player we developed can only play a single audio file, whose path is hardcoded in the constructed GStreamer pipeline. Modify this program so it can play audios in a playlist. In this case, play list should define full paths of the audio files you would like to play, one after the other. For example, you can specify the file paths as arguments to this application or load the paths defined in a text file or load all audio files from a directory.

Building a pipeline from elements

In the last section, a gst.Pipeline was automatically constructed for us by the gst.parse_launch method. All it required was an appropriate command string, similar to the one specified while running the command-line version of GStreamer. The creation and linking of elements was handled internally by this method. In this section, we will see how to construct a pipeline by adding and linking individual element objects. 'GStreamer Pipeline' construction is a fundamental technique that we will use throughout this article.

Time for action – playing an audio: method 2

We have already developed code for playing an audio. Let's now tweak the method AudioPlayer.constructPipeline to build the gst.Pipeline using different element objects.

  1. Rewrite the constructPipeline method as follows. You can also download the file from the Packt website for reference.

    1 def constructPipeline(self):
    2 self.player = gst.Pipeline()
    3 self.filesrc = gst.element_factory_make("filesrc")
    4 self.filesrc.set_property("location",
    5 "C:/AudioFiles/my_music.mp3")
    7 self.decodebin = gst.element_factory_make("decodebin",
    8 "decodebin")
    9 # Connect decodebin signal with a method.
    10 # You can move this call to self.connectSignals)
    11 self.decodebin.connect("pad_added",
    12 self.decodebin_pad_added)
    14 self.audioconvert = \
    15 gst.element_factory_make("audioconvert",
    16 "audioconvert")
    18 self.audiosink = \
    19 gst.element_factory_make("autoaudiosink",
    20 "a_a_sink")
    22 # Construct the pipeline
    23 self.player.add(self.filesrc, self.decodebin,
    24 self.audioconvert, self.audiosink)
    25 # Link elements in the pipeline.
    26 gst.element_link_many(self.filesrc, self.decodebin)
    27 gst.element_link_many(self.audioconvert,self.audiosink)

  2. We begin by creating an instance of class gst.Pipeline.
  3. Next, on line 2, we create the element for loading the audio file. Any new gst element can be created using the API method, gst.element_factory_make. The method takes the element name (string) as an argument. For example, on line 3, this argument is specified as "filesrc" in order to create an instance of element GstFileSrc. Each element will have a set of properties. The path of the input audio file is stored in a property location of self.filesrc element. This property is set on line 4. Replace the file path string with an appropriate audio file path.

    You can get a list of all properties by running the 'gst-inspect-0.10 ' command from a console window. See the introductory section on GSreamer for more details.

  4. The second optional argument serves as a custom name for the created object. For example, on line 20, the name for the autoaudiosink object is specified as a_a_sink. Like this, we create all the essential elements necessary to build the pipeline.
  5. On line 23 in the code, all the elements are put in the pipeline by calling the gst.Pipeline.add method.
  6. The method gst.element_link_many establishes connection between two or more elements for the audio data to flow between them. The elements are linked together by the code on lines 26 and 27. However, notice that we haven't linked together the elements self.decodebin and self.audioconvert. Why? That's up next.
  7. We cannot link the decodebin element with the audioconvert element at the time the pipeline is created. This is because decodebin uses dynamic pads. These pads are not available for connection with the audioconvert element when the pipeline is created. Depending upon the input data , it will create a pad. Thus, we need to watch out for a signal that is emitted when the decodebin adds a pad! How do we do that? It is done by the code on line 11 in the code snippet above. The "pad-added" signal is connected with a method, decodebin_pad_added. Whenever decodebin adds a dynamic pad, this method will get called.
  8. Thus, all we need to do is to manually establish a connection between decodebin and audioconvert elements in the method decodebin_pad_added. Write the following method.

    1 def decodebin_pad_added(self, decodebin, pad ):
    2 caps = pad.get_caps()
    3 compatible_pad = \
    4 self.audioconvert.get_compatible_pad(pad, caps)

    The method takes the element (in this case it is self.decodebin ) and pad as arguments. The pad is the new pad for the decodebin element. We need to link this pad with the appropriate one on self.audioconvert.

  9. On line 2 in this code snippet, we find out what type of media data the pad handles. Once the capabilities (caps) are known, we pass this information to the method get_compatible_pad of object self.audioconvert. This method returns a compatible pad which is then linked with pad on line 6.
  10. The rest of the code is identical with the one illustrated in the earlier section. You can run this program the same way described earlier.

What just happened?

We learned some very crucial components of GStreamer framework. With the simple audio player as an example, we created a GStreamer pipeline 'from scratch' by creating various element objects and linking them together. We also learned how to connect two elements by 'manually' linking their pads and why that was required for the element self.decodebin.

Playing an audio from a website

If there is an audio somewhere on a website that you would like to play, we can pretty much use the same AudioPlayer class developed earlier. In this section, we will illustrate the use of gst.Playbin2 to play an audio by specifying a URL. The code snippet below shows the revised AudioPlayer.constructPipeline method. The name of this method should be changed as it is playbin object that it creates.

1 def constructPipeline(self):
2 file_url = "http://path/to/audiofile.wav"
3 buf_size = 1024000
4 self.player = gst.element_factory_make("playbin2")
5 self.player.set_property("uri", file_url)
6 self.player.set_property("buffer-size", buf_size)
7 self.is_playing = False
8 self.connectSignals()

On line 4, the gst.Playbin2 element is created using gst.element_factory_make method. The argument to this method is a string that describes the element to be created. In this case it is playbi . You can also define a custom name for this object by supplying an optional second argument to this method. Next, on line 5 and 6, we assign values to the properties uri and buffer-size. Set the uri property to an appropriate URL , the full path to the audio file you would like to play.

Note: When you execute this program, Python application tries to access the Internet. The anti-virus installed on your computer may block the program execution. In this case, you will need to allow this program to access the Internet. Also, you need to be careful of hackers. If you get the fil_url from an untrusted source, perform a safety check such as assert not re.match("file://", file_url).

Have a go hero – use 'playbin' to play local audios

In the last few sections, we learned different ways to play an audio file using Python and GStreamer. In the previous section, you must have noticed another simple way to achieve this, using a playbin or playbin2 object to play an audio. In the previous section, we learned how to play an audio file from a URL. Modify this code so that this program can now play audio files located in a drive on your computer. Hint: You will need to use the correct uri path. Convert the file path using Python's module urllib.pathname2url and then append it to the string: "file://".

Converting audio file format

Suppose you have a big collection of songs in wav file format that you would like to load on a cell phone. But you find out that the cell phone memory card doesn't have enough space to hold all these. What will you do? You will probably try to reduce the size of the song files right? Converting the files into mp3 format will reduce the size. Of course you can do it using some media player. Let's learn how to perform this conversion operation using Python and GStreamer. Later we will develop a simple command-line utility that can be used to perform a batch conversion for all the files you need.

  1. Like in the earlier examples, let's first list the important building blocks we need to accomplish file conversion. The first three elements remain the same.
  2. As before, the first thing we need is to load an audio file for reading.
  3. Next, we need a decoder to transform the encoded information.
  4. Then, there needs to be an element to convert the raw audio buffers into an appropriate format.
  5. An encoder is needed that takes the raw audio data and encodes it to an appropriate file format to be written.
  6. An element where the encoded data will be streamed to is needed. In this case it is our output audio file.

Okay, what's next? Before jumping into the code, first check if you can achieve what you want using the command-line version of GStreamer.

$gstlaunch-0.10.exe filesrc location=/path/to/input.wav ! decodebin ! audioconvert ! lame ! Filesink location=/path/to/output.mp3

Specify the correct input and output file paths and run this command to convert a wave file to an mp3. If it works, we are all set to proceed. Otherwise check for missing plugins.

You should refer to the GStreamer API documentation to know more about the properties of various elements illustrated above. Trust me, the gst-inspect-0.10 (or gst-inspect-0.10.exe for Windows users) command is a very handy tool that will help you understand the components of a GStreamer plugin. The instructions on running this tool are already discussed earlier in this article.

(For more resources on Python, see here.)

Time for action – audio file format converter

Let's write a simple audio file converter. This utility will batch process input audio files and save them in a user-specified file format. To get started, download the file from the Packt website. This file can be run from the command line as:

python [options]

Where, the [options] are as follows:

  • --input_dir: The directory from which to read the input audio file(s) to be converted.
  • --input_format: The audio format of the input files. The format should be in a supported list of formats. The supported formats are "mp3", "ogg", and "wav". If no format is specified, it will use the default format as ".wav".
  • --output_dir : The output directory where the converted files will be saved. If no output directory is specified, it will create a folder OUTPUT_AUDIOS within the input directory.
  • --output_format: The audio format of the output file. Supported output formats are "wav" and "mp3".

Let's write this code now.

  1. Start by importing necessary modules.
    import os, sys, time
    import thread
    import getopt, glob
    import gobject
    import pygst
    import gst
  2. Now declare the following class and the utility function. As you will notice, several of the methods have the same names as before. The underlying functionality of these methods will be similar to what we already discussed. In this section we will review only the most important methods in this class. You can refer to file for other methods or develop those on your own.

    def audioFileExists(fil):
    return os.path.isfile(fil)

    class AudioConverter:
    def __init__(self):
    def constructPipeline(self):
    def connectSignals(self):
    def decodebin_pad_added(self, decodebin, pad):
    def processArgs(self):
    def convert(self):
    def convert_single_audio(self, inPath, outPath):
    def message_handler(self, bus, message):
    def printUsage(self):
    def printFinalStatus(self, inputFileList,
    starttime, endtime):

    # Run the converter
    converter = AudioConverter()
    thread.start_new_thread(converter.convert, ())
    evt_loop = gobject.MainLoop()

  3. Look at the last few lines of code above. This is exactly the same code we used in the Playing Music section. The only difference is the name of the class and its method that is put on the thread in the call thread.start_new_thread. At the beginning, the function audioFileExists() is declared. It will be used to check if the specified path is a valid file path.
  4. Now write the constructor of the class. Here we do initialization of various variables.

    def __init__(self):
    # Initialize various attrs
    self.inputDir = os.getcwd()
    self.inputFormat = "wav"
    self.outputDir = ""
    self.outputFormat = ""
    self.error_message = ""

    self.encoders = {"mp3":"lame",
    "wav": "wavenc"}
    self.supportedOutputFormats = self.encoders.keys()
    self.supportedInputFormats = ("ogg", "mp3", "wav")
    self.pipeline = None
    self.is_playing = False


  5. The self.supportedOutputFormats is a tuple that stores the supported output formats. self.supportedInputFormatsis a list obtained from the keys of self.encoders and stores the supported input formats. These objects are used in self.processArgumentsto do necessary checks. The dictionary self.encoders provides the correct type of encoder string to be used to create an encoder element object for the GStreamer pipeline. As the name suggests, the call to self.constructPipeline() builds a gst.Pipeline instance and various signals are connected using self.connectSignals().
  6. Next, prepare a GStreamer pipeline.

    def constructPipeline(self):
    self.pipeline = gst.Pipeline("pipeline")

    self.filesrc = gst.element_factory_make("filesrc")
    self.decodebin = gst.element_factory_make("decodebin")
    self.audioconvert = gst.element_factory_make(
    self.filesink = gst.element_factory_make("filesink")

    encoder_str = self.encoders[self.outputFormat]
    self.encoder= gst.element_factory_make(encoder_str)

    self.pipeline.add( self.filesrc, self.decodebin,
    self.audioconvert, self.encoder,

    gst.element_link_many(self.filesrc, self.decodebin)
    gst.element_link_many(self.audioconvert, self.encoder,

  7. This code is similar to the one we developed in the Playing Music sub-section. However there are some noticeable differences. In the Audio Player example, we used the autoaudiosink plugin as the last element. In the Audio Converter, we have replaced it with elements self.encoder and self.filesink. The former encodes the audio data coming out of the self.audioconvert. The encoder will be linked to the sink element. In this case, it is a filesink. The self.filesink is where the audio data is written to a file given by the location property.
  8. The encoder string, encoder_str determines the type of encoder element to create. For example, if the output format is specified as "mp3" the corresponding encoder to use is "lame" mp3 encoder. You can run the gst-inspect-0.10 command to know more about the lame mp3 encoder. The following command can be run from shell on Linux.

    $gst-inspect-0.10 lame

  9. The elements are added to the pipeline and then linked together. As before, the self.decodebin and self.audioconvert are not linked in this method as the decodebin plugin uses dynamic pads. The pad_added signal from the self.decodebin is connected in the self.connectSignals() method.
  10. Another noticeable change is that we have not set the location property for both, self.filesrc and self.filesink. These properties will be set at the runtime. The input and output file locations keep on changing as the tool is a batch processing utility.
  11. Let's write the main method that controls the conversion process.

    1 def convert(self):
    2 pattern = "*." + self.inputFormat
    3 filetype = os.path.join(self.inputDir, pattern)
    4 fileList = glob.glob(filetype)
    5 inputFileList = filter(audioFileExists, fileList)
    7 if not inputFileList:
    8 print "\n No audio files with extension %s "\
    9 "located in dir %s"%(
    10 self.outputFormat, self.inputDir)
    11 return
    12 else:
    13 # Record time before beginning audio conversion
    14 starttime = time.clock()
    15 print "\n Converting Audio files.."
    17 # Save the audio into specified file format.
    18 # Do it in a for loop If the audio by that name already
    19 # exists, do not overwrite it
    20 for inPath in inputFileList:
    21 dir, fil = os.path.split(inPath)
    22 fil, ext = os.path.splitext(fil)
    23 outPath = os.path.join(
    24 self.outputDir,
    25 fil + "." + self.outputFormat)
    28 print "\n Input File: %s%s, Conversion STARTED..."\
    29 % (fil, ext)
    30 self.convert_single_audio(inPath, outPath)
    31 if self.error_message:
    32 print "\n Input File: %s%s, ERROR OCCURED" \
    33 % (fil, ext)
    34 print self.error_message
    35 else:
    36 print "\nInput File: %s%s,Conversion COMPLETE"\
    37 % (fil, ext)
    39 endtime = time.clock()
    41 self.printFinalStatus(inputFileList, starttime,
    42 endtime)
    43 evt_loop.quit()

  12. All the input audio files are collected in the list inputFileList by the code between lines 2 to 6. Then, we loop over each of these files. First, the output file path is derived based on user inputs and then the input file path.
  13. The highlighted line of code is the workhorse method, AudioConverter.convert_single_audio, that actually does the job of converting the input audio. We will discuss that method next. On line 43, the main event loop is terminated. The rest of the code in method convert is self-explanatory.
  14. The code in method convert_single_audio is illustrated below.

    1 def convert_single_audio(self, inPath, outPath):
    2 inPth = repr(inPath)
    3 outPth = repr(outPath)
    5 # Set the location property for file source and sink
    6 self.filesrc.set_property("location", inPth[1:-1])
    7 self.filesink.set_property("location", outPth[1:-1])
    9 self.is_playing = True
    10 self.pipeline.set_state(gst.STATE_PLAYING)
    11 while self.is_playing:
    12 time.sleep(1)

  15. As mentioned in the last step, convert_single_audio method is called within a for loop in the self.convert(). The for loop iterates over a list containing input audio file paths. The input and output file paths are given as arguments to this method. The code between lines 8-12 looks more or less similar to method illustrated in the Play audio section. The only difference is the main event loop is not terminated in this method. Earlier we did not set the location property for the file source and sink. These properties are set on lines 6 and 7 respectively.
  16. Now what's up with the code on lines 2 and 3? The call repr(inPath) returns a printable representation of the string inPath. The inPathis obtained from the 'for loop'. The os.path.normpath doesn't work on this string. In Windows, if you directly use inPath, GStreamer will throw an error while processing such a path string. One way to handle this is to use repr(string) , which will return the whole string including the quotes . For example: if inPath be "C:/AudioFiles/my_music.mp3" , then repr(inPath) will return in "'C:\\\\AudioFiles\\\\my_music.mp3'". Notice that it has two single quotes. We need to get rid of the extra single quotes at the beginning and end by slicing the string as inPth[1:-1]. There could be some other better ways. You can come up with one and then just use that code as a path string!
  17. Let's quickly skim through a few more methods. Write these down:

    def connectSignals(self):

    # Connect the signals.
    # Catch the messages on the bus
    bus = self.pipeline.get_bus()
    bus.connect("message", self.message_handler)
    # Connect the decodebin "pad_added" signal.

    def decodebin_pad_added(self, decodebin, pad):
    caps = pad.get_caps()
    self.audioconvert.get_compatible_pad(pad, caps)

  18. The connectSignal method is identical to the one discussed in the Playing music section, except that we are also connecting the decodebin signal with a method decodebin_pad_added. Add a print statement to decodebin_pad_added to check when it gets called. It will help you understand how the dynamic pad works! The program starts by processing the first audio file. The method convert_single_audio gets called. Here, we set the necessary file paths. After that, it begins playing the audio file. At this time, the pad_addedsignal is generated. Thus based on the input file data, decodebin will create the pad.
  19. The rest of the methods such as processArgs, printUsage, and message_handler are self-explanatory. You can review these methods from the file
  20. The audio converter should be ready for action now! Make sure that all methods are properly defined and then run the code by specifying appropriate input arguments. The following screenshot shows a sample run of audio conversion utility on Windows XP. Here, it will batch process all audio files in directory C:\AudioFiles with extension .ogg and convert them into mp3 file format . The resultant mp3 files will be created in directory C:\AudioFiles\OUTPUT_AUDIOS.

    Python Multimedia: Working with Audios

What just happened?

A basic audio conversion utility was developed in the previous section. This utility can batch-convert audio files with ogg or mp3 or wav format into user-specified output format (where supported formats are wav and mp3). We learned how to specify encoder and filesink elements and link them in the GStreamer pipeline. To accomplish this task, we also applied knowledge gained in earlier sections such as creation of GStreamer pipeline, capturing bus messages, running the main event loop, and so on.

Have a go hero – do more with audio converter

The audio converter we wrote is fairly simple. It deserves an upgrade.

Extend this application to support more audio output formats such as ogg, flac, and so on. The following pipeline illustrated one way of converting an input audio file into ogg file format.

filesrc location=input.mp3 ! decodebin ! audioconvert ! vorbisenc ! oggmux ! filesink location=output.ogg

Notice that we have an audio muxer, oggmux, that needs to be linked with encoder vorbisenc. Similarly, to create an MP4 audio file, it will need {faac ! mp4mux} as encoder and audio muxer. One of the simplest things to do is to define proper elements (such as encoder and muxer) and instead of constructing a pipeline from individual elements, use the gst.parse_launch method we studied earlier and let it automatically create and link elements using the command string. You can create a pipeline instance each time the audio conversion is called for. But in this case you would also need to connect signals each time the pipeline is created. Another better and simpler way is to link the audio muxer in the AudioConverter.constructPipeline method. You just need to check if it is needed based on the type of plugin you are using for encoding. In this case the code will be:

gst.element_link_many(self.audioconvert, self.encoder,
self.audiomuxer, self.filesink)

The audio converter illustrated in this example takes input files of only a single audio file format. This can easily be extended to accept input audio files in all supported file formats (except for the type specified by the --output_format option). The decodebin should take care of decoding the given input data. Extend Audio Converter to support this feature. You will need to modify the code in the AudioConverter.convert() method where the input file list is determined.

Extracting part of an audio

Suppose you have recorded a live concert of your favorite musician or a singer. You have saved all this into a single file with MP3 format but you would like to break this file into small pieces. There is more than one way to achieve this using Python and GStreamer. We will use the simplest and perhaps the most efficient way of cutting a small piece from an audio track. It makes use of an excellent GStreamer plugin, called Gnonlin.

The Gnonlin plugin

The multimedia editing can be classified as linear or non-linear. Non-linear multimedia editing enables control over the media progress in an interactive way. For example, it allows you to control the order in which the sources should be executed. At the same time it allows modifications to the position in a media track. While doing all this, note that the original source (such as an audio file) remains unchanged. Thus the editing is non-destructive. The Gnonlin or (G-Non-Linear) provides essential elements for non-linear editing of a multimedia. It has five major elements, namely, gnlfilesource, gnlurisource, gnlcomposition, gnloperation, and gnlsource. To know more about their properties, run gst-inspect-0.10 command on each of these elements.

Here, we will only focus on the element gnlfilesource and a few of its properties. This is really a GStreamer bin element. Like decodebin, it determines which pads to use at the runtime. As the name suggests, it deals with the input media file. All you need to specify is the input media source it needs to handle. The media file format can be any of the supported media formats. The gnlfilesource defines a number of properties. To extract a chunk of an audio, we just need to consider three of them:

  • media-start: The position in the input media file, which will become the start position of the extracted media. This is specified in nanoseconds.
  • media-duration: Total duration of the extracted media file (beginning from media-start). This is specified in nanoseconds as well.
  • uri: The full path of the input media file. For example, if it is a file on your local hard drive, the uri will be something like file:///C:/AudioFiles/my_music.mp3. If the file is located on a website, then the uri will something of this sort: http://path/to/file.mp3.

The gnlfilesource internally does operations like loading and decoding the file, seeking the track to the specified position, and so on. This makes our job easier. We just need to create basic elements that will process the information furnished by gnlfilesource, to create an output audio file. Now that we know the basics of gnlfilesource, let's try to come up with a GStreamer pipeline that will cut a portion of an input audio file.

  • First the gnlfilesource element that does the crucial job of loading, decoding the file, seeking the correct start position, and finally presenting us with an audio data that represents the portion of track to be extracted.
  • An audioconvert element that will convert this data into an appropriate audio format.
  • An encoder that encodes this data further into the final audio format we want.
  • A sink where the output data is dumped. This specifies the output audio file.

Try running the following from the command prompt by replacing the uri and location paths with appropriate file paths on your computer.

$gst-launch-0.10.exe gnlfilesource uri=file:///C:/my_music.mp3
media-start=0 media-duration=15000000000 !
audioconvert !
lame !
filesink location=C:/my_chunk.mp3

This should create an extracted audio file of duration 15 seconds, starting at the initial position on the original file. Note that the media-start and media-duration properties take the input in nanoseconds. This is really the essence of what we will do next.

Time for action – MP3 cutter!

In this section we will develop a utility that will cut out a portion of an MP3 formatted audio and save it as a separate file.

  1. Keep the file handy. You can download it from the Packt website. Here we will only discuss important methods. The methods not discussed here are similar to the ones from earlier examples. Review the file which has all the necessary source code to run this application.
  2. Start the usual way. Do the necessary imports and write the following skeleton code.

    import os, sys, time
    import thread
    import gobject
    import pygst
    import gst

    class AudioCutter:
    def __init__(self):
    def constructPipeline(self):
    def gnonlin_pad_added(self, gnonlin_elem, pad):
    def connectSignals(self):
    def run(self):
    def printFinalStatus(self):
    def message_handler(self, bus, message):

    #Run the program
    audioCutter = AudioCutter()
    thread.start_new_thread(, ())
    evt_loop = gobject.MainLoop()

    The overall code layout looks familiar doesn't it? The code is very similar to the code we developed earlier in this article. The key here is the appropriate choice of the file source element and linking it with the rest of the pipeline! The last few lines of code create a thread with method and run the main event loop as seen before.

  3. Now fill in the constructor of the class. We will keep it simple this time. The things we need will be hardcoded within the constructor of the class AudioCutter. It is very easy to implement a processArgs() method as done on many occasions before. Replace the input and output file locations in the code snippet with a proper audio file path on your computer.

    def __init__(self):
    self.is_playing = False
    # Flag used for printing purpose only.
    self.error_msg = ''

    self.media_start_time = 100
    self.media_duration = 30
    self.inFileLocation = "C:\AudioFiles\my_music.mp3"
    self.outFileLocation = "C:\AudioFiles\my_music_chunk.mp3"


  4. The self.media_start_time is the new starting position of the mp3 file in seconds. This is the new start position for the extracted output audio. The self.duration variable stores the total duration extracted track. Thus, if you have an audio file with a total duration of 5 minutes, the extracted audio will have a starting position corresponding to 1 min, 40 seconds on the original track. The total duration of this output file will be 30 seconds, that is, the end time will correspond to 2 minutes, 10 seconds on the original track. The last two lines of this method build a pipeline and connect signals with class methods.
  5. Next, build the GStreamer pipeline.

    1 def constructPipeline(self):
    2 self.pipeline = gst.Pipeline()
    3 self.filesrc = gst.element_factory_make(
    4 "gnlfilesource")
    6 # Set properties of filesrc element
    7 # Note: the gnlfilesource signal will be connected
    8 # in self.connectSignals()
    9 self.filesrc.set_property("uri",
    10 "file:///" + self.inFileLocation)
    11 self.filesrc.set_property("media-start",
    12 self.media_start_time*gst.SECOND)
    13 self.filesrc.set_property("media-duration",
    14 self.media_duration*gst.SECOND)
    16 self.audioconvert = \
    17 gst.element_factory_make("audioconvert")
    19 self.encoder = \
    20 gst.element_factory_make("lame", "mp3_encoder")
    22 self.filesink = \
    23 gst.element_factory_make("filesink")
    25 self.filesink.set_property("location",
    26 self.outFileLocation)
    28 #Add elements to the pipeline
    29 self.pipeline.add(self.filesrc, self.audioconvert,
    30 self.encoder, self.filesink)
    31 # Link elements
    32 gst.element_link_many(self.audioconvert,self.encoder,
    33 self.filesink)

    The highlighted line of code (line 3) creates the gnlfilesource. We call this as self.filesrc. As discussed earlier, this is responsible for loading and decoding audio data and presenting only the required portion of audio data that we need. It enables a higher level of abstraction in the main pipeline.

  6. The code between lines 9 to 13 sets three properties of gnlfilesource, uri, media-start and media-duration. The media-start and media-duration are specified in nanoseconds. Therefore, we multiply the parameter value (which is in seconds) by gst.SECOND which takes care of the units.
  7. The rest of the code looks very much similar to the Audio Converter example. In this case, we only support saving the file in mp3 audio format. The encoder element is defined on line 19. self.filesink determines where the output file will be saved. Elements are added to the pipeline by self.pipeline.add call and are linked together on line 32. Note that the gnlfilesource element, self.filesrc, is not linked with self.audioconvert while constructing the pipeline. Like the decodebin, the gnlfilesource implements dynamic pads. Thus, the pad is not available when the pipeline is constructed. It is created at the runtime depending on the specified input audio format. The "pad_added" signal of gnlfilesource is connected with a method self.gnonlin_pad_added.
  8. Now write the connectSignals and gnonlin_pad_added methods.

    def connectSignals(self):
    # capture the messages put on the bus.
    bus = self.pipeline.get_bus()
    bus.connect("message", self.message_handler)

    # gnlsource plugin uses dynamic pads.
    # Capture the pad_added signal.

    def gnonlin_pad_added(self, gnonlin_elem, pad):
    compatible_pad = \
    self.audioconvert.get_compatible_pad(pad, caps)

    The highlighted line of code in method connectSignals connects the pad_added signal of gnlfilesource with a method gnonlin_pad_added. The gnonlin_pad_added method is identical to the decodebin_pad_added method of class AudioConverter developed earlier. Whenever gnlfilesource creates a pad at the runtime, this method gets called and here, we manually link the pads of gnlfilesource with the compatible pad on self.audioconvert.

  9. The rest of the code is very much similar to the code developed in the Playing an audio section. For example, method is equivalent to and so on. You can review the code for remaining methods from the file
  10. Once everything is in place, run the program from the command line as:


  11. This should create a new MP3 file which is just a specific portion of the original audio file.

What just happened?

We accomplished creation of a utility that can cut a piece out of an MP3 audio file (yet keep the original file unchanged). This audio piece was saved as a separate MP3 file. We learned about a very useful plugin, called Gnonlin, intended for non-linear multimedia editing. A few fundamental properties of gnlfilesource element in this plugin to extract an audio file.

Have a go hero – extend MP3 cutter

  • Modify this program so that the parameters such as media_start_time can be passed as an argument to the program. You will need a method like processArguments(). You can use either getopt or OptionParser module to parse the arguments.
  • Add support for other file formats. For example, extend this code so that it can extract a piece from a wav formatted audio and save it as an MP3 audio file. The input part will be handled by gnlfilesource. Depending upon the type of output file format, you will need a specific encoder and possibly an audio muxer element. Then add and link these elements in the main GStreamer pipeline.


After learning how to cut out a piece from our favorite music tracks, the next exciting thing we will have is a 'home grown' audio recorder. Then use it the way you like to record music, mimicry or just a simple speech!

Remember what pipeline we used to play an audio? The elements in the pipeline to play an audio were filesrc ! decodebin ! audioconvert ! autoaudiosink. The autoaudiosink did the job of automatically detecting the output audio device on your computer.

For recording purposes, the audio source is going to be from the microphone connected to your computer. Thus, there won't be any filesrc element. We will instead replace with a GStreamer plugin that automatically detects the input audio device. On similar lines, you probably want to save the recording to a file. So, the autoaudiosink element gets replaced with a filesink element.

autoaudiosrc is an element we can possibly use for detecting input audio source. However, while testing this program on Windows XP, the autoaudiosrc was unable to detect the audio source for unknown reasons. So, we will use the Directshow audio capture source plugin called dshowaudiosrc, to accomplish the recording task. Run the gst-inspect-0.10 dshowaudiosrc command to make sure it is installed and to learn various properties of this element. Putting this plugin in the pipeline worked fine on Windows XP. The dshowaudiosrc is linked to the audioconvert.

With this information, let's give it a try using the command-line version of GStreamer. Make sure you have a microphone connected or built into your computer. For a change, we will save the output file in ogg format.

gst-launch-0.10.exe dshowaudiosrc num-buffers=1000 !
audioconvert ! audioresample !
vorbisenc ! oggmux !
filesink location=C:/my_voice.ogg

The audioresample re-samples the raw audio data with different sample rates. Then the encoder element encodes it. The multiplexer or mux, if present, takes the encoded data and puts it into a single channel. The recorded audio file is written to the location specified by the filesink element.

Time for action – recording

Okay, time to write some code that does audio recording for us.

  1. Download the file and review the code. You will notice that the only important task is to set up a proper pipeline for audio recording. Content-wise, the other code is very much similar to what we learned earlier in the article. It will have some minor differences such as method names and print statements. In this section we will discuss only the important methods in the class AudioRecorder.
  2. Write the constructor.

    def __init__(self):
    self.is_playing = False
    self.num_buffers = -1
    self.error_message = ""

  3. This is similar to the AudioPlayer.__init__() except that we have added a call to processArgs() and initialized the error reporting variable self.error_message and the variable that indicates the total duration of the recording.
  4. Build the GStreamer pipeline by writing constructPipeline method.

    1 def constructPipeline(self):
    2 # Create the pipeline instance
    3 self.recorder = gst.Pipeline()
    5 # Define pipeline elements
    6 self.audiosrc = \
    7 gst.element_factory_make("dshowaudiosrc")
    9 self.audiosrc.set_property("num-buffers",
    10 self.num_buffers)
    12 self.audioconvert = \
    13 gst.element_factory_make("audioconvert")
    15 self.audioresample = \
    16 gst.element_factory_make("audioresample")
    18 self.encoder = \
    19 gst.element_factory_make("lame")
    21 self.filesink = \
    22 gst.element_factory_make("filesink")
    24 self.filesink.set_property("location",
    25 self.outFileLocation)
    27 # Add elements to the pipeline
    28 self.recorder.add(self.audiosrc, self.audioconvert,
    29 self.audioresample,
    30 self.encoder, self.filesink)
    32 # Link elements in the pipeline.
    33 gst.element_link_many(self.audiosrc,self.audioconvert,
    34 self.audioresample,
    35 self.encoder,self.filesink)

  5. We use the dshowaudiosrc (Directshow audiosrc) plugin as an audio source element. It finds out the input audio source which will be, for instance, the audio input from a microphone.
  6. On line 9, we set the number of buffers property to the one specified by self.num_buffers. This has a default value as -1 , indicating that there is no limit on the number of buffers. If you specify this value as 500 for instance, it will output 500 buffers (5 second duration) before sending a End of Stream message to end the run of the program.
  7. On line 15, an instance of element 'audioresample' is created. This element is takes the raw audio buffer from the self.audioconvert and re-samples it to different sample rates. The encoder element then encodes the audio data into a suitable format and the recorder file is written to the location specified by self.filesink.
  8. The code between lines 28 to 35 adds various elements to the pipeline and links them together.
  9. Review the code in file to add rest of the code. Then run the program to record your voice or anything that you want to record that makes an audible sound! Following are sample command-line arguments. This program will record an audio for 5 seconds.

    $python –-num_buffers=500

What just happened?

We learned how to record an audio using Python and GStreamer. We developed a simple audio recording utility to accomplish this task. The GStreamer plugin, dshowaudiosrc, captured the audio input for us. We created the main GStreamer Pipeline by adding this and other elements and used it for the Audio Recorder program.


This article gave us deeper insight into the fundamentals of audio processing using Python and the GStreamer multimedia framework. We used several important components of GStreamer to develop some frequently needed audio processing utilities. The main learning points of the article can be summarized as follows:

  • GStreamer installation: We learned how to install GStreamer and the dependent packages on various platforms. This set up a stage for learning audio processing techniques and will also be useful for the next chapters on audio/video processing.
  • A primer on GStreamer: A quick primer on GStreamer helped us understand important elements required for media processing.
  • Use of GStreamer API to develop audio tools: We learned how to use GStremer API for general audio processing. This helped us develop tools such as an Audio player, a file format converter, an MP3 cutter, and audio recorder.

Further resources on this subject:

You've been reading an excerpt of:

Python Multimedia

Explore Title