Objects in Python

Exclusive offer: get 50% off this eBook here
Python 3 Object Oriented Programming

Python 3 Object Oriented Programming — Save 50%

Harness the power of Python 3 objects

$29.99    $15.00
by Dusty Phillips | July 2010 | Open Source

Let's have a look at the Python syntax that allows us to create object-oriented software. In this article by Dusty Phillips, Author of Python 3 Object Oriented Programming we will understand:

  • How to create classes and instantiate objects in Python
  • How to add attributes and behaviors to Python objects
  • How to organize classes into packages and modules

(For more resources on Python 3, see here.)

Creating Python classes

We don't have to write much Python code to realize that Python is a very "clean" language. When we want to do something, we just do it, without having to go through a lot of setup. The ubiquitous, "hello world" in Python, as you've likely seen, is only one line.

Similarly, the simplest class in Python 3 looks like this:

class MyFirstClass:
pass

There's our first object-oriented program! The class definition starts with the class keyword. This is followed by a name (of our choice) identifying the class, and is terminated with a colon.

The class name must follow standard Python variable naming rules (must start with a letter or underscore, can only be comprised of letters, underscores, or numbers). In addition, the Python style guide (search the web for "PEP 8"), recommends that classes should be named using CamelCase notation (start with a capital letter, any subsequent words should also start with a capital).

The class definition line is followed by the class contents, indented. As with other Python constructs, indentation is used to delimit the classes, rather than braces or brackets as many other languages use. Use four spaces for indentation unless you have a compelling reason not to (such as fitting in with somebody else's code that uses tabs for indents). Any decent programming editor can be configured to insert four spaces whenever the Tab key is pressed.

Since our first class doesn't actually do anything, we simply use the pass keyword on the second line to indicate that no further action needs to be taken.

We might think there isn't much we can do with this most basic class, but it does allow us to instantiate objects of that class. We can load the class into the Python 3 interpreter so we can play with it interactively. To do this, save the class definition mentioned earlier into a file named first_class.py and then run the command python -i first_class.py. The -i argument tells Python to "run the code and then drop to the interactive interpreter". The following interpreter session demonstrates basic interaction with this class:

>>> a = MyFirstClass()

>>> b = MyFirstClass()

>>> print(a)

<__main__.MyFirstClass object at 0xb7b7faec>

>>> print(b)

<__main__.MyFirstClass object at 0xb7b7fbac>

>>>

This code instantiates two objects from the new class, named a and b. Creating an instance of a class is a simple matter of typing the class name followed by a pair of parentheses. It looks much like a normal function call, but Python knows we're "calling" a class and not a function, so it understands that its job is to create a new object. When printed, the two objects tell us what class they are and what memory address they live at. Memory addresses aren't used much in Python code, but here,it demonstrates that there are two distinctly different objects involved.

Adding attributes

Now, we have a basic class, but it's fairly useless. It doesn't contain any data, and it doesn't do anything. What do we have to do to assign an attribute to a given object?

It turns out that we don't have to do anything special in the class definition. We can set arbitrary attributes on an instantiated object using the dot notation:

class Point:
pass

p1 = Point()
p2 = Point()

p1.x = 5
p1.y = 4

p2.x = 3
p2.y = 6

print(p1.x, p1.y)
print(p2.x, p2.y)

If we run this code, the two print statements at the end tell us the new attribute values on the two objects:

5 4
3 6

This code creates an empty Point class with no data or behaviors. Then it creates two instances of that class and assigns each of those instances x and y coordinates to identify a point in two dimensions. All we need to do to assign a value to an attribute on an object is use the syntax <object>.<attribute>=<value>. This is sometimes referred to as dot notation. The value can be anything: a Python primitive, a built-in data type, another object. It can even be a function or another class!

Making it do something

Now, having objects with attributes is great, but object-oriented programming is really about the interaction between objects. We're interested in invoking actions that cause things to happen to those attributes. It is time to add behaviors to our classes.

Let's model a couple of actions on our Point class. We can start with a method called reset that moves the point to the origin (the origin is the point where x and y are both zero). This is a good introductory action because it doesn't require any parameters:

class Point:
def reset(self):
self.x = 0
self.y = 0

p = Point()
p.reset()
print(p.x, p.y)

That print statement shows us the two zeros on the attributes:

0 0

A method in Python is identical to defining a function. It starts with the keyword def followed by a space and the name of the method. This is followed by a set of parentheses containing the parameter list (we'll discuss the self parameter in just a moment), and terminated with a colon. The next line is indented to contain the statements inside the method. These statements can be arbitrary Python code operating on the object itself and any parameters passed in as the method sees fit.

The one difference between methods and normal functions is that all methods have one required argument. This argument is conventionally named self; I've never seen a programmer use any other name for this variable (convention is a very powerful thing). There's nothing stopping you, however, from calling it this or even Martha.

The self argument to a method is simply a reference to the object that the method is being invoked on. We can access attributes and methods of that object as if it were any other object. This is exactly what we do inside the reset method when we set the x and y attributes of the self object.

Notice that when we call the p.reset() method, we do not have to pass the self argument into it. Python automatically takes care of this for us. It knows we're calling a method on the p object, so it automatically passes that object to the method.

However, the method really is just a function that happens to be on a class. Instead of calling the method on the object, we could invoke the function on the class, explicitly passing our object as the self argument:

p = Point()
Point.reset(p)
print(p.x, p.y)

The output is the same as the previous example because, internally, the exact same process has occurred.

What happens if we forget to include the self argument in our class definition? Python will bail with an error message:

>>> class Point:
... def reset():
... pass
...
>>> p = Point()
>>> p.reset()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: reset() takes no arguments (1 given)

The error message is not as clear as it could be ("You silly fool, you forgot the self argument" would be more informative). Just remember that when you see an error message that indicates missing arguments, the first thing to check is whether you forgot self in the method definition.

So how do we pass multiple arguments to a method? Let's add a new method that allows us to move a point to an arbitrary position, not just the origin. We can also include one that accepts another Point object as input and returns the distance between them:

import math

class Point:
def move(self, x, y):
self.x = x
self.y = y

def reset(self):
self.move(0, 0)

def calculate_distance(self, other_point):
return math.sqrt((self.x - other_point.x)**2 +(self.y - other_point.y)

**2)

# how to use it:
point1 = Point()
point2 = Point()

point1.reset()
point2.move(5,0)
print(point2.calculate_distance(point1))
assert (point2.calculate_distance(point1) ==
point1.calculate_distance(point2))
point1.move(3,4)
print(point1.calculate_distance(point2))
print(point1.calculate_distance(point1))

The print statements at the end give us the following output:

5.0
4.472135955
0.0

A lot has happened here. The class now has three methods. The move method accepts two arguments, x and y, and sets the values on the self object, much like the old reset method from the previous example. The old reset method now calls move, since a reset is just a move to a specific known location.

The calculate_distance method uses the not-too-complex Pythagorean Theorem to calculate the distance between two points. I hope you understand the math (** means squared, and math.sqrt calculates a square root), but it's not a requirement for our current focus: learning how to write methods.

The example code at the end shows how to call a method with arguments; simply include the arguments inside the parentheses, and use the same dot notation to access the method. I just picked some random positions to test the methods. The test code calls each method and prints the results on the console. The assert function is a simple test tool; the program will bail if the statement after assert is False (or zero, empty, or None). In this case, we use it to ensure that the distance is the same regardless of which point called the other point's calculate_distance method.

Initializing the object

If we don't explicitly set the x and y positions on our Point object, either using move or by accessing them directly, we have a broken point with no real position. What will happen when we try to access it?

Well, let's just try it and see. "Try it and see" is an extremely useful tool for Python study. Open up your interactive interpreter and type away. The following interactive session shows what happens if we try to access a missing attribute. If you saved the previous example as a file or are using the examples distributed in this article, you can load it into the Python interpreter with the command python -i filename.py.

>>> point = Point()
>>> point.x = 5
>>> print(point.x)
5
>>> print(point.y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Point' object has no attribute 'y'

Well, at least it threw a useful exception. You've probably seen them before (especially the ubiquitous SyntaxError, which means you typed something incorrectly!). At this point, simply be aware that it means something went wrong.

The output is useful for debugging. In the interactive interpreter it tells us the error occurred at line 1, which is only partially true (in an interactive session, only one line is executed at a time). If we were running a script in a file, it would tell us the exact line number, making it easy to find the offending code. In addition, it tells us the error is an AttributeError, and gives a helpful message telling us what that error means.

We can catch and recover from this error, but in this case, it feels like we should have specified some sort of default value. Perhaps every new object should be reset() by default or maybe it would be nice if we could force the user to tell us what those positions should be when they create the object.

Most object-oriented programming languages have the concept of a constructor, a special method that creates and initializes the object when it is created. Python is a little different; it has a constructor and an initializer. Normally, the constructor function is rarely ever used unless you're doing something exotic. So we'll start our discussion with the initialization method.

The Python initialization method is the same as any other method, except it has a special name: __init__. The leading and trailing double underscores mean, "this is a special method that the Python interpreter will treat as a special case". Never name a function of your own with leading and trailing double underscores. It may mean nothing to Python, but there's always the possibility that the designers of Python will add a function that has a special purpose with that name in the future, and when they do, your code will break.

Let's start with an initialization function on our Point class that requires the user to supply x and y coordinates when the Point object is instantiated:

class Point:
def __init__(self, x, y):
self.move(x, y)

def move(self, x, y):
self.x = x
self.y = y

def reset(self):
self.move(0, 0)

# Constructing a Point
point = Point(3, 5)
print(point.x, point.y)

Now, our point can never go without a y coordinate! If we try to construct a point without including the proper initialization parameters, it will fail with a not enough arguments error similar to the one we received earlier when we forgot the self argument.

What if we don't want to make those two arguments required? Well then we can use the same syntax Python functions use to provide default arguments. The keyword argument syntax appends an equals sign after each variable name. If the calling object does not provide that argument, then the default argument is used instead; the variables will still be available to the function, but they will have the values specified in the argument list. Here's an example:

class Point:
def __init__(self, x=0, y=0):
self.move(x, y)

Most of the time, we put our initialization statements in an __init__ function. But as mentioned earlier, Python has a constructor in addition to its initialization function. You may never need to use the other Python constructor, but it helps to know it exists, so we'll cover it briefly.

The constructor function is called __new__ as opposed to __init__, and accepts exactly one argument, the class that is being constructed (it is called before the object is constructed, so there is no self argument). It also has to return the newly created object. This has interesting possibilities when it comes to the complicated art of meta-programming, but is not very useful in day-to-day programming. In practice, you will rarely, if ever, need to use __new__, and __init__ will be sufficient.

Explaining yourself

Python is an extremely easy-to-read programming language; some might say it is self-documenting. However, when doing object-oriented programming, it is important to write API documentation that clearly summarizes what each object and method does. Keeping documentation up-to-date is difficult; the best way to do it is to write it right into our code.

Python supports this through the use of docstrings. Each class, function, or method header can have a standard Python string as the first line following the definition (the line that ends in a colon). This line should be indented the same as the following code.

Docstrings are simply Python strings enclosed with apostrophe (') or quote (") characters. Often, docstrings are quite long and span multiple lines (the style guide suggests that line-length should not exceed 80 characters), which can be formatted as multi-line strings, enclosed in matching triple apostrophe (''') or triple quote (""") characters.

A docstring should clearly and concisely summarize the purpose of the class or method it is describing. It should explain any parameters whose usage is not immediately obvious, and is also a good place to include short examples of how to use the API. Any caveats or problems an unsuspecting user of the API should be aware of should also be noted.

To illustrate the use of docstrings, we will end this part with our completely documented Point class:

import math

class Point:
'Represents a point in two-dimensional geometric coordinates'

def __init__(self, x=0, y=0):
'''Initialize the position of a new point. The x and y
coordinates can be specified. If they are not, the point
defaults to the origin.'''
self.move(x, y)

def move(self, x, y):
"Move the point to a new location in two-dimensional space."
self.x = x
self.y = y

def reset(self):
'Reset the point back to the geometric origin: 0, 0'
self.move(0, 0)

def calculate_distance(self, other_point):
"""Calculate the distance from this point to a second point
passed as a parameter.

This function uses the Pythagorean Theorem to calculate
the distance between the two points. The distance is returned
as a float."""

return math.sqrt(

(self.x - other_point.x)**2 +

(self.y – other_point.y)**2)

Try typing or loading (remember, it's python -i filename.py) this file into the interactive interpreter. Then enter help(Point)<enter> at the Python prompt. You should see nicely formatted documentation for the class, as shown in the following screenshot:

Objects in Python

Python 3 Object Oriented Programming Harness the power of Python 3 objects
Published: July 2010
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

(For more resources on Python 3, see here.)

Modules and packages

Now that we know how to create classes and instantiate objects, it is time to think about organizing them. For small programs, we can just put all our classes into one file and put some code at the end of the file to start them interacting. However, as our projects grow, it can become difficult to find one class that needs to be edited among the many classes we've defined. This is where modules come in. Modules are simply Python files, nothing more. The single file in our small program is a module. Two Python files are two modules. If we have two files in the same folder, we can load a class from one module for use in the other module.

For example, if we are building an e-commerce system, we will likely be storing a lot of data in a database. We can put all the classes and functions related to database access into a separate file (we'll call it something sensible: database.py). Then our other modules (for example: customer models, product information, and inventory) can import classes from that module in order to access the database.

The import statement is used for importing modules or specific classes or functions from modules. We've already seen an example of this in our Point class in the previous part. We used the import statement to get Python's built-in math module so we could use its sqrt function in our distance calculation.

Here's a concrete example. Assume we have a module called database.py that contains a class called Database, and a second module called products.py that is responsible for product-related queries. At this point, we don't need to think too much about the contents of these files. What we know is that products.py needs to instantiate the Database class from database.py so it can execute queries on the product table in the database.

There are several variations on the import statement syntax that can be used to access the class.

import database
db = database.Database()
# Do queries on db

This version imports the database module into the products namespace (the list of names currently accessible in a module or function), so any class or function in the database module can be accessed using database.<something> notation. Alternatively, we can import just the one class we need using the from...import syntax:

from database import Database
db = Database()
# Do queries on db

If, for some reason, products already has a class called Database, and we don't want the two names to be confused, we can rename the class when used inside the products module:

from database import Database as DB
db = DB()
# Do queries on db

We can also import multiple items in one statement. If our database module also contains a Query class, we can import both classes using:

from database import Database, Query

Some sources say that we can even import all classes and functions from the database module using this syntax:

from database import *

Don't do this. Every experienced Python programmer will tell you that you should never use this syntax. They'll use obscure justifications like, "it clutters up the namespace", which doesn't make much sense to beginners. One way to learn why to avoid this syntax is to use it and try to understand your code two years later. But we can save some time and two years of poorly written code with a quick explanation now!

When we explicitly import the database class at the top of our file using from database import Database, we can easily see where the Database class comes from. We might use db = Database() 400 lines later in the file, and we can quickly look at the imports to see where that Database class came from. Then if we need clarification as to how to use the Database class, we can visit the original file (or import the module in the interactive interpreter and use the help(database.Database) command). However, if we use from database import * syntax, it takes a lot longer to find where that class is located. Code maintenance becomes a nightmare.

In addition, many editors are able to provide extra functionality, such as reliable code completion or the ability to jump to the definition of a class if normal imports are used. The import * syntax usually completely destroys their ability to do this reliably.

Finally, using the import * syntax can bring unexpected objects into our local namespace. Sure, it will import all the classes and functions defined in the module being imported from, but it will also import any classes or modules that were themselves imported into that file!

In spite of all these warnings, you may think, "if I only use from X import * syntax for one module, I can assume any unknown imports come from that module". This is technically true, but it breaks down in practice. I promise that if you use this syntax, you (or someone else trying to understand your code) will have extremely frustrating moments of, "Where on earth can this class be coming from?" Every name used in a module should come from a well-specified place, whether it is defined in that module, or explicitly imported from another module. There should be no magic variables that seem to come out of thin air. We should always be able to immediately identify where the names in our current namespace originated.

Organizing the modules

As a project grows into a collection of more and more modules, we may find that we want to add another level of abstraction, some kind of nested hierarchy on our modules' levels. But we can't put modules inside modules; one file can only hold one file, after all, and modules are nothing more than Python files.

Files, however, can go in folders and so can modules. A package is a collection of modules in a folder. The name of the package is the name of the folder. All we need to do to tell Python that a folder is a package and place a (normally empty) file in the folder named __init__.py. If we forget this file, we won't be able to import modules from that folder.

Let's put our modules inside an ecommerce package in our working folder, which will also contain a main.py to start the program. Let's additionally add another package in the ecommerce package for various payment options. The folder hierarchy will look like this:

parent_directory/
main.py
ecommerce/
__init__.py
database.py
products.py
payments/
__init__.py
paypal.py
authorizenet.py

When importing modules or classes between packages, we have to be cautious about the syntax. In Python 3, there are two ways of importing modules: absolute imports and relative imports.

Absolute imports

Absolute imports specify the complete path to the module, function, or path we want to import. If we need access to the Product class inside the products module, we could use any of these syntaxes to do an absolute import:

import ecommerce.products
product = ecommerce.products.Product()

or

from ecommerce.products import Product
product = Product()

or

from ecommerce import products
product = products.Product()

The import statements separate packages or modules using the period as a separator.

These statements will work from any module. We could instantiate a Product using this syntax in main.py, in the database module, or in either of the two payment modules. Indeed, so long as the packages are available to Python, it will be able to import them. For example, the packages can also be installed to the Python site packages folder, or the PYTHONPATH environment variable could be customized to dynamically tell Python what folders to search for packages and modules it is going to import.

So with these choices, which syntax do we choose? It depends on your personal taste and the application at hand. If there are dozens of classes and functions inside the products module that I want to use, I generally import the module name using the from ecommerce import products syntax and then access the individual classes using products.Product. If I only need one or two classes from the products module, I import them directly using the from ecommerce.proucts import Product syntax. I don't personally use the first syntax very often unless I have some kind of name conflict (for example, I need to access two completely different modules called products and I need to separate them). Do whatever you think makes your code look more elegant.

Relative imports

When working with related modules in a package, it seems kind of silly to specify the full path; we know what our parent module is named. This is where relative imports come in. Relative imports are basically a way of saying "find a class, function, or module as it is positioned relative to the current module". For example, if we are working in the products module and we want to import the Database class from the database module "next" to it, we could use a relative import:

from .database import Database

The period in front of database says, "Use the database module inside the current package". In this case, the current package is the package containing the products.py file we are currently editing, that is, the ecommerce package.

If we were editing the paypal module inside the ecommerce.payments package, we would want to say, "Use the database package inside the parent package", instead. That is easily done with two periods:

from ..database import Database

We can use more periods to go further up the hierarchy. Of course, we can also go down one side and back up the other. We don't have a deep enough example hierarchy to illustrate this properly, but the following would be a valid import if we had a ecommerce.contact package containing an email module and wanted to import the send_mail function into our paypal module:

from ..contact.email import send_mail

This import uses two periods to say, "the parent of the payments package", then uses normal package.module syntax to go back "up" into the contact package.

Inside any one module, we can specify variables, classes, or functions. They can be a handy way of storing global state without namespace conflicts. For example, we have been importing the Database class into various modules and then instantiating it, but it might make more sense to have only one database object globally available from the database module. The database module might look like this:

class Database:
# the database implementation
pass
database = Database()

Then we can use any of the import methods we've discussed to access the database object, for example:

from ecommerce.database import database

A problem with the above class is that the database object is created immediately when the module is first imported, which is usually when the program starts up. This isn't always ideal, since connecting to a database can take a while, slowing down startup, or the database connection information may not yet be available. We could delay creating the database until it is actually needed by calling an initialize_database function to create the module-level variable:

class Database:
# the database implementation
pass

database = None

def initialize_database():
global database
database = Database()

The global keyword tells Python that the database variable inside initialize_database is the module-level one we just defined. If we had not specified the variable as global, Python would have created a new local variable that would be discarded when the method exits, leaving the module-level value unchanged.

As these two examples illustrate, all code in a module is executed immediately at the time it is imported. However, if it is inside a method or function, the function will be created, but its internal code will not be executed until the function is called. This can be a tricky thing for scripts (like the main script in our e-commerce example) that perform execution. Often, we will write a program that does something useful, and then later find that we want to import a function or class from that module in a different program. But as soon as we import it, any code at the module level is immediately executed. If we are not careful, we can end up running the first program when we really only meant to access a couple functions inside that module.

To solve this, we should always put our startup code in a function (conventionally called main) and only execute that function when we know we are executing as a script, but not when our code is being imported from a different script. But how do we know that?:

class UsefulClass:
'''This class might be useful to other modules.'''
pass

def main():
'''creates a useful class and does something with it for our
module.'''
useful = UsefulClass()
print(useful)

if __name__ == "__main__":
main()

Every module has a __name__ special variable (remember, Python uses double underscores for special variables, like a class's __init__ method) that specifies the name of the module when it was imported. But when the module is executed directly with python module.py, it is never imported, so the __name__ is set to the string "__main__". Make it a policy to wrap all your scripts in an if __name__ == "__main__": test, just in case you write a function you will find useful to be imported by other code someday.

So methods go in classes, which go in modules, which go in packages. Is that all there is to it?

Actually, no. That is the typical order of things in a Python program, but it's not the only possible layout. Classes can be defined anywhere. They are typically defined at the module level, but they can also be defined inside a function or method, like so:

def format_string(string, formatter=None):
'''Format a string using the formatter object, which
is expected to have a format() method that accepts
a string.'''
class DefaultFormatter:
'''Format a string in title case.'''
def format(self, string):
return str(string).title()

if not formatter:
formatter = DefaultFormatter()

return formatter.format(string)

hello_string = "hello world, how are you today?"
print(" input: " + hello_string)
print("output: " + format_string(hello_string))

Output:

input: hello world, how are you today?
output: Hello World, How Are You Today?

The format_string function accepts a string and optional formatter object, and then applies the formatter to that string. If no formatter is supplied, it creates a formatter of its own as a local class and instantiates it. Since it is created inside the scope of the function, this class cannot be accessed from anywhere outside of that function. Similarly, functions can be defined inside other functions as well; in general, any Python statement can be executed at any time. These "inner" classes and functions are useful for "one-off" items that don't require or deserve their own scope at the module level, or only make sense inside a single method.

Summary

In this article we have covered:

  • How to create classes and instantiate objects in Python
  • How to add attributes and behaviors to Python objects
  • How to organize classes into packages and modules

Further resources on this subject:


Python 3 Object Oriented Programming Harness the power of Python 3 objects
Published: July 2010
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

About the Author :


Dusty Phillips

Dusty Phillips is a Canadian freelance software developer, teacher, martial artist, and open source aficionado. He is closely affiliated with the Arch Linux community and other open source projects. He maintains the Arch Linux storefronts and has compiled the Arch Linux Handbook. Dusty holds a master's degree in computer science, with specialization in Human Computer Interaction. He currently has six different Python interpreters installed on his computer.

Books From Packt


Plone 3.3 Site Administration
Plone 3.3 Site Administration

Nginx HTTP Server
Nginx HTTP Server

Plone 3 Products Development Cookbook
Plone 3 Products Development Cookbook

Plone 3 Multimedia
Plone 3 Multimedia

Django 1.2 E-commerce
Django 1.2 E-commerce

Amazon SimpleDB Developer Guide
Amazon SimpleDB Developer Guide

Plone 3 Intranets
Plone 3 Intranets

Tcl 8.5 Network Programming
Tcl 8.5 Network Programming


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
u
G
i
H
1
9
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software