Django 1.2 E-commerce: Data Integration

(Read more interesting articles on Django 1.2 e-commerce here.)

We will be using a variety of tools, many builtin to Django. These are all relatively stable and mature, but as with all open source technology, new versions could change their usage at any time.

Exposing data and APIs

One of the biggest elements of the web applications developed in the last decade has been the adoption of so-called Web 2.0 features. These come in a variety of flavors, but one thing that has been persistent amongst them all is a data-centric view of the world. Modern web applications work with data, usually stored in a database, in ways that are more modular and flexible than ever before. As a result, many web-based companies are choosing to share parts of their data with the world in hopes of generating "buzz", or so that interested developers might create a clever "mash-up" (a combination of third-party application software with data exposed via an API or other source).

These mash-ups take a variety of forms. Some simply allow external data to be integrated or imported into a desktop or web-based application. For example, loading Amazon's vast product catalog into a niche website on movie reviews. Others actually deploy software written in web-based languages into their own application. This software is usually provided by the service that is exposing their data in the form of a code library or web-accessible API.

Larger web services that want to provide users with programmatic access to their data will produce code libraries written in one or more of the popular web-development languages. Increasingly, this includes Python, though not always, and typically also includes PHP, Java, or Perl. Often when an official data library exists in another language, an enterprising developer has ported the code to Python.

Increasingly, however, full-on code libraries are eschewed in favor of open, standards-based, web-accessible APIs. These came into existence on the Web in the form of remote procedure call tools. These mapped functions in a local application written in a programming language that supports XML-RPC to functions on a server that exposed a specific, well-documented interface. XML and network transport protocols were used "under the hood" to make the connection and "call" the function.

Other similar technologies also achieved a lot of use. For example, many web-services provide Simple Object Access Protocol (SOAP) interface, which is the successor to XML-RPC and built on a very similar foundation. Other standards, sometimes with proprietary implementations, also exist, but many new web-services are now building APIs using REST-style architecture.

REST stands for Representational State Transfer and is a lightweight and open technique for transmitting data across the Web in both server-to-server and client-to-server situations. It has become extremely popular in the Web 2.0 and open source world due to its ease of use and its reliance on standard web protocols such as HTTP, though it is not limited to any one particular protocol.

A full discussion of REST web services is beyond the scope of this article. Despite their simplicity, there can arise many complicated technical details. Our implementation in this article will focus on a very straightforward, yet powerful design.

REST focuses on defining our data as a resource that when used with HTTP can map to a URL. Access to data in this scheme is simply a matter of specifying a URL and, if supported, any look-up, filter, or other operational parameters. A fully featured REST web service that uses the HTTP protocol will attempt to define as many operations as possible using the basic HTTP access methods. These include the usual GET and POST methods, but also PUT and DELETE, which can be used for replacement, updating, or deletion of resources.

There is no standard implementation of a REST-based web service and as such the design and use can vary widely from application to application. Still, REST is lightweight enough and relies on a well known set of basic architectures that a developer can learn a new REST-based web service in a very short period of time. This gives it a degree of advantage over competing SOAP or XML-RPC web services. Of course, there are many people who would dispute this claim. For our purposes, however, REST will work very well and we will begin by implementing a REST-based view of our data using Django.

Writing our own REST service in Django would be very straightforward, partly because URL mapping schemes are very easy to design in the file. A very quick and dirty data API could be created using the following super-simple URL patterns:

(r'^api/(?P<obj_model>\w*)/$', 'project.views.api')
(r'^api/(?P<obj_model>\w*)/(?P<id>\d*)/$', 'project.views.api')

And this view:

from django.core import serializers

def api(request, obj_model, obj_id=None):
model = get_model(obj_model.split("."))
if model is None:
raise Http404
if obj_id is not None:
results = model.objects.get(id=obj_id)
results = model.objects.all()
json_data = serializers.serialize('json', results)
return HttpResponse(json_data, mimetype='application/json'))

This approach as it is written above is not recommended, but it shows an example of one of the simplest possible data APIs. The API view returns the full set of model objects requested in JSON form. JSON is a simple, lightweight data format that resembles JavaScript syntax. It is quickly becoming the preferred method of data transfer for web applications.

To request a list of all products, for example, we only need to access the following URL path on our site: /api/products.Product/. This uses Django's app.model syntax to refer to the model we want to retrieve. The view uses get_model to obtain a reference to the Product model and then we can work with it as needed. A specific model can be retrieved by including an object ID in the URL path: /api/products.Product/123/ would retrieve the Product whose ID is 123.

After obtaining the results data, it must be encoded to JSON format. Django provides serializers for several data formats, including JSON. These are all located in the django.code.serializers module. In our case, we simply pass the results QuerySet to the serialize function, which returns our JSON data. We can limit the fields to be serialized by including a field's keyword argument in the call to serialize:

json_data = serializers.serialize('json', results,

We can also use the built-in serializers to generate XML. We could modify the above view to include a format flag to allow the generation of JSON or XML:

def api(request, obj_model, obj_id=None, format='json'):
model = get_model(*obj_model.split())
If model is None:
raise Http404
if obj_id is not None:
results = model.objects.get(id=obj_id)
results = model.objects.all()
serialized_data = serializers.serialize(format, results)
return HttpResponse(serialized_data,
mimetype='application/' + format)

Format could be passed directly on the URL or better yet, we could define two distinct URL patterns and use Django's keyword dictionary:

(r'^api/(?P<obj_model>\w*)/$', 'project.views.api'),
(r'^api/(?P<obj_model>\w*)/xml/$', 'project.views.api',
{'format': 'xml'}),
(r'^api/(?P<obj_model>\w*)/yaml/$', 'project.views.api',
{'format': 'yaml'}),
(r'^api/(?P<obj_model>\w*)/python/$', 'project.views.api',
{'format': 'python'}),

By default our serializer will generate JSON data, but we've got to provide alternative API URLs that support XML, YAML, and Python formats. These are the four built-in formats supported by Django's serializers module. Note that Django's support for YAML as a serialization format requires installation of the third-party PyYAML module.

Building our own API is in some ways both easy and difficult. Clearly we have a good start with the above code, but there are many problems. For example, this is exposing all of our Django model information to the world, including our User objects. This is why we do not recommend this approach. The views could be password protected or require a login (which would make programmatic access from code more difficult) or we could look for another solution.

Django-piston: A mini-framework for data APIs

One excellent Django community project that has emerged recently is called django-piston. Piston allows Django developers to quickly and easily build data APIs for their web applications using a REST-style interface. It supports all the serialization formats mentioned above and includes sophisticated authentication tools such as OAuth as well as HTTP Basic.

The official repository for django-piston is hosted on bitbucket at the following URL:

Complete documentation on the installation and usage of Piston are available on the bitbucket site and in the readme file.

Piston supports the full set of HTTP methods: GET, POST, PUT, and DELETE. GET is used for the retrieval of objects, POST is used for creation, PUT is used for updating, and DELETE is used for deletion. Any subset of these operations can be defined on a model-by-model basis. Piston does this by using class-based "handlers" that behave somewhat like class-based generic views.

To define a handler on our Product model, we would write something like this:

from piston.handler import BaseHandler
from coleman.products import Product

class ProductHandler(BaseHandler):
allowed_methods = ('GET',)
model = Product

def read(self, request, post_slug):

The ProductHandler defines one operation, the GET, on our Product model. To define the behavior when a GET request is made to a Product object, we write a read method. Method names for other HTTP operations include: create for POST, update for PUT, and delete for DELETE. Each of these methods can be defined on our ProductHandler and added to the allowed_methods class variable and Piston will instantly enable them in our web-based API.

To utilize our ProductHandler, we must create the appropriate URL scheme in our file:

from piston.resource import Resource
from coleman.api.handlers import ProductHandler

product_resource = Resource(ProductHandler)
(r'^product/(?P<slug>[^/]+)/', product_resource)

Our Product objects and their data are now accessible using the URL above and the Product slug field, as in: /api/product/cranberry-sauce/.

Piston allows us to restrict the returned data by including fields and exclude attributes on our handler class:

class ProductHandler(BaseHandler):
fields = ('name', 'slug', 'description')
exclude = ('id', 'photo')

Piston also makes it very easy to request our data in a different format. Simply pass the format as a GET parameter to any Piston-enabled URL and set the value to any of the formats Piston supports. For example, to get our Cranberry Sauce product information in YAML format use: /api/product/cranberry-sauce/?format=yaml.

Adding authentication to our handlers is also very simple. Django-piston includes three kinds of authentication handlers in the current release: HTTP BASIC, OAuth, and Django. The Django authentication handler is a simple wrapper around the usual Django auth module. This means users will need cookies enabled and will be required to log in to the site using their Django account before this auth handler will grant API access.

The other two handlers are more suitable for programmatic access from a script or off-site. HTTP BASIC uses the standard, web-server based authentication. In a typical Apache configuration, this involves defining user and password combinations in an htpasswd file using the htpasswd command line utility. See the web server's documentation for more details. It's also possible to configure Apache authentication against Django's auth module to support HTTP BASIC auth against the Django database. This involves adding the django.contrib.auth.handlers.modpython handler to the Apache configuration. See the Django manual for additional details.

To attach BASIC authentication to the handler for our Product model, we will include it in our file as part of the Resource object definition:

from piston.authentication import HttpBasicAuthentication

basic_auth = HttpBasicAuthentication(realm='Products API') product_
resource = Resource(handler=ProductHandler, auth=basic_auth)

Our Product URLs will now be available only to clients who have passed HTTP BASIC authentication with a user name and password.

As we've seen, Piston makes building a REST-based API for our Django projects extremely easy. It also uses some Django design principles we've seen earlier. For example, the authentication tools are designed to be pluggable. We can examine the HttpBasicAuthentication class in piston.authentication as a template to write our own. A custom authentication class can be plugged in to the Resource definition with just a small change to the code. Despite being easily customizable, Piston's default setup includes enough built-in functionality for the majority of data API needs.

Django's syndication framework

In cases where a full-fledged data API would be overkill, but exporting some data in a standard format is required, Django's syndication tools may be the perfect fit. Having been originally designed for newspaper websites, Django includes robust support for exporting information in syndication formats. This is usually done using the popular RSS or Atom feed formats.

Syndication feeds allow us to render our data in a standard format that can be consumed by human-controlled reader software, such as Google Reader or NetNewsWire, and also by machines running software tools.

Feeds began as a way of consuming content from multiple sources in a single location using reader software. Today, however, lots of variations on this theme exist. For example, Twitter is itself one big feed generating application (one can even consume Twitter content in RSS/Atom format).

It's often not necessary to have a reason to syndicate our data, as it is increasingly considered a courtesy that enables others to consume our information in their own way. In an e-commerce situation, we could export a collection of weekly or daily promotional sales as RSS or Atom feeds, to which our customers could subscribe and then read at their convenience in their preferred feed reader.

Feeds could also be parsed by "affiliate" sites or even by physical devices. Imagine a retail store that posted their weekly specials in RSS format then connected this feed to an LED sign or LCD TV that can consume RSS information. The sign could be posted in the store window and would be constantly refreshing that store's advertised sales.

Django's syndication framework includes lots of flexibility in the way it generates feeds. In the simplest form, we just need to write a class specific to the model whose information we want to export. This class defines an items method, which is used to retrieve the contents of the feed in an appropriate order. A simple Django template is used to define the content of the feed.

Let's start by building a feed class for our Product model:

from django.contrib.syndication.feeds import Feed
from coleman.products.models import Product

class AllProducts(Feed):
title = ''s Product Catalog'
link = '/feed/'
description = 'An updating feed of Products available on our site'

def items(self):
return Product.objects.all()

This is the simplest example of generating a feed in Django. We can enhance this in many ways, all of which are described in the Django documentation. The syndication framework is very similar to django-piston in that after defining a feed, we must update our file to include it on our site:

from coleman.feeds import AllProducts

feeds = {'products': AllProducts}

urlpatterns = patterns(
(r'^feeds/(?P<url>.*)/$', 'django.contrib.syndication.views.feed',
{'feed_dict': feeds}))

Our Product models are now exporting to an RSS feed at the URL /feeds/products/. Notice how the framework uses the feed dictionary's keys as the final portion of the URL. We can create as many feeds as we wish and only need to update the feed dictionary. The URL pattern will be reused by the syndication framework.

In Django, the template system can be used in various ways. When dealing with syndicated feeds, templates are used to control the output to the feed. Regardless of format, the template is translated appropriately and the same templates can be used for any feed the framework will produce (versions of RSS and Atom).

Django does not require us to create a feed template. In cases where no template exists, it will default to the string representation of the model (the __unicode__ or __str__ methods). In most cases, though, we will want to create appropriate templates.

The feed templates live in a feeds directory beneath our site's templates location. Recall the feed dictionary we used earlier; the keys to this dictionary not only affect the URL where the feed lives, but also what templates are rendered. In our previous example, the feed dictionary contained a key called products for our AllProducts feed.

Django will attempt to load two templates for this feed, both based off the dictionary key: feeds/products_title.html and feeds/products_description.html. Note that despite ending in .html extensions, these templates are not required to contain HTML and are never complete HTML documents with head and body tags, only fragments. The title template will be rendered for the feed's title element and the description will be rendered in the body.

These templates will have access to two variables, obj and site, which correspond to the object we're rendering the feed template against and the site where it lives. We access these values using normal Django template double-brace syntax.

Our feed templates can contain HTML, but the results may vary depending on what application is used to consume our feeds. Some have a limited amount of HTML knowledge and most will ignore any attempt to format our output using CSS or other design tools. Using basic HTML tags is recommended for best results.

An example of a feed title template for our AllProducts feed would be:

{{ }}

And the corresponding description template could look like this:

<h1>{{ }} - {{ obj.get_price }}</h1>
<p>{{ obj.description }}</p>

In our case obj will be an instance of our Product model, so we can use it just as we would in any other Django template.

Django sitemaps

Traditionally, very large sites have provided a directory of the information they've made available using a navigational device known as a sitemap. The intent was originally to help users find information they were looking for with the least amount of effort. Turns out, however, there were better solutions to this problem, namely search engines.

Why dig through a long list of irrelevant information when you can simply type in some keywords and get a much more accurate list of potential matches? Sitemaps made a lot of sense before search engine technology was available on a wide scale, but Searching the Product Catalog, adding search to our applications is now super easy. Even sites that don't offer a search engine can be searched using Google.

So why does Django include a module for automatic sitemap generation? These sitemaps are a little different from the traditional sitemaps, which were usually designed in HTML for human consumption. The sitemaps produced by Django are XML files that are designed to inform machines about the layout and content of your site.

When we say machines, we're really talking about search engines and, more precisely, Google. Google uses sitemaps to build its index of your site. This is very important for achieving a high position in Google's search results.

Django's sitemap module is in django.constrib.sitemaps. To get started, we need to add this to our INSTALLED_APPS setting and create a URL in our root URL configuration. The URL requires us to pass in a dictionary that resembles the feed dictionary we discussed in the previous section. It takes a string and maps it to a Sitemap class for a specific section of our site. For example, we may have a Products sitemap that lists all of our product pages as well as other sitemaps that list manufacturers, special deals, blog posts from our corporate blog, or any other piece of content we've put on the Web.

A sitemap class for our Products would look something like this:

from django.contrib.sitemaps import Sitemap
from coleman.products.models import Product

class ProductSitemap(Sitemap):
def items(self):
return Product.objects.all()

We could filter our Product module instead of calling all and limit it by some useful metric. Say, for example, we had a discontinued field but the Product object remained in our database. We may not want Google to index discontinued products so we could return Product.objects.filter(discontinued=False) in the sitemap items method.

The items returned for the sitemap will be assumed to have a get_absolute_url method. This method will be used by the Django sitemap framework to construct the URLs for all of our objects in the sitemap. It is important to make sure your get_absolute_url methods are correct and functional for any object you want indexed in a sitemap.

Once we've generated a sitemap for our site, we want to make sure to tell Google about it. The way to do this is register for a Google Webmaster account at In addition to submitting our sitemap, Google's Webmaster Tools let us see all kinds of interesting metrics about our site and how Google evaluates it. This includes what keywords we score highly on and where any incoming links originate. It also lets us track how changes to our site affect our position in the Google index.

If you're building an e-commerce site with lots of content, a very large product catalog, for example, it is highly recommended that you generate a sitemap and submit it to Google. Several books have been written about "Search Engine Optimization" that include lots of search engine magic. Building a sitemap is among the best and most realistic tactics for improving your search result position.

(Read more interesting articles on Django 1.2 e-commerce here.) integration

We will end this article with a quick overview of Salesforce integration. Salesforce is a cloud-based data management tool that allows organizations to store and manage their data in a collaborative way. Information about important contacts, related organizations, and any other custom piece of data that a team may need to share can be stored in Salesforce.

One advantage of using Salesforce to manage your data is that it includes many built-in reporting mechanisms. These are especially useful in medium-sized organizations that have enough people to make sharing data difficult, but do not have the time or resources to build a custom internal solution, like those we've seen earlier.

The difficulty with Salesforce, however, is that data must be entered. If you're collecting information via a Django application, it is stored in a local database. You could expose this data as we did in the beginning of this article using an API, but once exposed you still need tools to manipulate it.

Taking the Salesforce approach, we can push our Django database information into a Salesforce account, which can include custom objects that represent a subset of data we've collected in Django. These objects are defined via the Salesforce website, but typically resemble a table definition you would make in any SQL database.

Salesforce is accessible via a SOAP API and several Python wrapper libraries exist. One such library is salesforce-beatbox, which is available at:

The beatbox application lets you receive and send information to your Salesforce instance. It requires very little in the way of Python code to get up and running. You will need to generate a security token from your Salesforce preferences page, but after you obtain this you can connect to the Salesforce API like so:

import beatbox

USER = ''
PASSWD = 'xxxxx'
SECURITY_TOKEN = 'gds#mklz!'

svc = beatbox.PythonClient() tokenpass = '%s%s' % (PASSWD,
SECURITY_TOKEN) svc.login(USER, tokenpass)

The svc object should now successfully be connected to Salesforce and we can begin querying our objects using the Salesforce query language, which they call SOQL:

result = svc.query("SELECT id,name FROM Merchandise__c")

The result variable contains a list of dictionary objects. Each dictionary has the keys specified in the SELECT statement and the corresponding values in the Salesforce table. Any custom object created in Salesforce will include a __c suffix when referenced in a query. We've created a Merchandise object in Salesforce, which we query above using the merchandise__c table.

Creating objects in Salesforce is equally as simple. To do so we need to construct a dictionary that includes the values for the object we want to create. A particularly important key that is required in all cases is type. The type key specifies what kind of Salesforce object we are creating. For example, the following dictionary sets up the new Merchandise object that we will create in Salesforce:

merch_data = {'name': 'Cranberry Sauce',
'price_in_dollars__c': '1.50',
'type': 'Merchandise__c'}

Once we've created our data, we can send it to Salesforce using our svc object:


Updating Salesforce information is equally as straightforward. The only difference is that our data dictionary must include an id key whose value is the Salesforce unique ID number. This is in addition to a type key, which is also required for updates, as well as the fields which we want to update and their new values. It is not required that every field has a key in the update dictionary, only those we want to change. For example:

updated_data = {'id': '02111AC2DB987', 'type': 'Merchandise__c',
'price_in_dollars__c': '1.75'}

Create and update calls to the Salesforce API can also perform bulk updates using their bulk API. This happens automatically when using beatbox when you construct your new or updated data as a list, instead of a single dictionary. Bulk updates are limited to 200 items per call, so some logic is required to slice your data lists to conform to this limit. It is also a requirement that all of the data you are sending to the Salesforce bulk API is of the same type.

Salesforce Object Query Language

When querying our Salesforce instance using the query API function above, we used a subset of SQL that Salesforce calls SOQL. SOQL is much simpler than most SQL implementations, but allows us to filter and slice our Salesforce data. Once we've issued a query to the Salesforce API, we can use Python to manipulate the returned data if we need further filtering.

SOQL exists exclusively for querying Salesforce. As a result, almost all SOQL commands resemble the SELECT statement in standard SQL implementations. But unlike most SQL systems, SOQL has limited the features of the SELECT statement. You cannot perform arbitrary joins, for example. Here is an example SOQL query:

SELECT Name, Description
FROM Merchandise__c
WHERE Name LIKE 'Cranberry'

In cases where we have relationships defined on our Salesforce objects, SOQL allows us to perform queries on these relationships using a special dot-syntax. Imagine the Merchandise object queried in the previous SOQL statement included a relationship to a Manufacturer object. To SELECT based on the related Manufacturer's field information would look like this:

SELECT Name, Description, Manufacturer.Address
FROM Merchandise__c
WHERE Manufacturer.Name == 'CranCo'

This SOQL query would return the name and description of all Merchandise in Salesforce that was manufactured by "CranCo". It would also include the manufacturer's address field in the resulting data. When using the Python beatbox module, these results are again returned as a list of dictionaries.

The Salesforce API is relatively small. We've seen two of the most important functions in these examples: query and create. There are additional functions for deleting, updating, and even defining new object types. The beatbox module supports almost all of the functions in the API, but not every one. For example, it does not currently support the undelete function. This compatibility information is documented in the beatbox README.txt.

With data in Salesforce, it suddenly becomes accessible to entire teams without any additional development work. This can be very useful in small organizations whose access to developers is limited. Non-technical staff can use Salesforce to run reports and pull information in an intuitive, web-based application. Often this data would be locked-up inside our web application, accessible just to those who can perform a Django ORM call or write a SQL query.

Practical use-cases

The techniques and examples in this article have been varied, but all center on the primary theme of data integration. As e-commerce applications become more sophisticated and web-based businesses grow more competitive, the advanced functionality we've covered here becomes increasingly important. Django's rapid development nature and excellent community makes implementation of these tools faster and easier than ever before.

Building a RESTful data API is important for transmitting data between different systems and machines. This data could be shared across the Web or across an internal corporate network. It could support applications built in another department or in an affiliate-marketing style. A data API unlocks our information for whomever we wish to share it with.

Feeds allow a similar kind of data transmission, though on a somewhat higher level than a data API. It can let human users, in addition to machines, parse our data and is an increasingly popular delivery mechanism for consuming content.

Generating visual reports is a still higher-level form of communicating our data, specifically geared for human use. The tools we've discussed could allow an e-commerce platform to automatically send sales reports to interested members of an organization or to produce other metrics suitable for printing and hard-copy distribution.

Finally, integration with a web-based service like demonstrates how our e-commerce applications can transmit, receive, and interact with other related applications. Increasingly, web-based third-party services are a cost-effective method of analyzing, managing, and working with large volumes of information. Integrating our application demonstrates Django's flexibility in this area.


In this article we've covered several different facets of manipulating data from our Django application. This included:

  • Machine-accessible API functions
  • Human and machine-readable feed exports
  • Transmission to and from a web-based service

We've seen that it is very easy to work with data from our Django application, convert it to new formats, transmit it across the wire, and display it in powerful ways. As Django is written in Python the ecosystem of tools becomes very liberating here. In addition to ReportLab, there are a dozen other graphical report and charting tools available in Python. We could have written an entire article on any one of them. This continues to demonstrate the advantages of Django as a web-framework.

If you have read this article you may be interested to view:

You've been reading an excerpt of:

Django 1.2 E-commerce

Explore Title