Python 3: Building a Wiki Application


Python 3 Web Development Beginner's Guide

Python 3 Web Development Beginner's Guide

Use Python to create, theme, and deploy unique web applications

        Read more about this book      

(For more resources on Python, see here.)

Nowadays, a wiki is a well-known tool to enable people to maintain a body of knowledge in a cooperative way. Wikipedia ( might be the most famous example of a wiki today, but countless numbers of forums use some sort of wiki and many tools and libraries exist to implement a wiki application.

In this article, we will develop a wiki of our own, and in doing so, we will focus on two important concepts in building web applications. The first one is the design of the data layer. The second one is input validation. A wiki is normally a very public application that might not even employ a basic authentication scheme to identify users. This makes contributing to a wiki very simple, yet also makes a wiki vulnerable in the sense that anyone can put anything on a wiki page. It's therefore a good idea to verify the content of any submitted change. You may, for example, strip out any HTML markup or disallow external links.

Enhancing user interactions in a meaningful way is often closely related with input validation. Client-side input validation helps prevent the user from entering unwanted input and is therefore a valuable addition to any application but is not a substitute for server-side input validation as we cannot trust the outside world not to try and access our server in unintended ways.

The data layer

A wiki consists of quite a number of distinct entities we can indentify. We will implement these entities and the relations that exist between them by reusing the Entity/Relation framework developed earlier.


Time for action – designing the wiki data model

As with any application, when we start developing our wiki application we must first take a few steps to create a data model that can act as a starting point for the development:

  1. Identify each entity that plays a role in the application. This might depend on the requirements. For example, because we want the user to be able to change the title of a topic and we want to archive revisions of the content, we define separate Topic and Page entities.
  2. Identify direct relations between entities. Our decision to define separate Topic and Page entities implies a relation between them, but there are more relations that can be identified, for example, between Topic and Tag. Do not specify indirect relations: All topics marked with the same tag are in a sense related, but in general, it is not necessary to record these indirect relations as they can easily be inferred from the recorded relation between topics and tags.

The image shows the different entities and relations we can identify in our wiki application.

In the diagram, we have illustrated the fact that a Topic may have more than one Page while a Page refers to a single User in a rather informal way by representing Page as a stack of rectangles and User as a single rectangle. In this manner, we can grasp the most relevant aspects of the relations at a glance. When we want to show more relations or relations with different characteristics, it might be a good idea to use more formal methods and tools. A good starting point is the Wikipedia entry on UML:

Python 3 Web Development

What just happened?

With the entities and relations in our data model identified, we can have a look at their specific qualities.

The basic entity in a wiki is a Topic. A topic, in this context, is basically a title that describes what this topic is about. A topic has any number of associated Pages. Each instance of a Page represents a revision; the most recent revision is the current version of a topic. Each time a topic is edited, a new revision is stored in the database. This way, we can simply revert to an earlier version if we made a mistake or compare the contents of two revisions. To simplify identifying revisions, each revision has a modification date. We also maintain a relation between the Page and the User that modified that Page.

In the wiki application that we will develop, it is also possible to associate any number of tags with a topic. A Tag entity consists simply of a tag attribute. The important part is the relation that exists between the Topic entity and the Tag entity.

Like a Tag, a Word entity consists of a single attribute. Again, the important bit is the relation, this time, between a Topic and any number of Words. We will maintain this relation to reflect the words used in the current versions (that is, the last revision of a Page) of a Topic. This will allow for fairly responsive full text search facilities.

The final entity we encounter is the Image entity. We will use this to store images alongside the pages with text. We do not define any relation between topics and images. Images might be referred to in the text of the topic, but besides this textual reference, we do not maintain a formal relation. If we would like to maintain such a relation, we would be forced to scan for image references each time a new revision of a page was stored, and probably we would need to signal something if a reference attempt was made to a non-existing image. In this case, we choose to ignore this: references to images that do not exist in the database will simply show nothing:


from entity import Entity
from relation import Relation
class User(Entity): pass
class Topic(Entity): pass
class Page(Entity): pass
class Tag(Entity): pass
class Word(Entity): pass
class Image(Entity): pass
class UserPage(Relation): pass
class TopicPage(Relation): pass
class TopicTag(Relation): pass
class ImagePage(Relation): pass
class TopicWord(Relation): pass
def threadinit(db):
def inittable():
User.inittable(userid="unique not null")
Topic.inittable(title="unique not null")
modified="not null default CURRENT_TIMESTAMP")
Tag.inittable(tag="unique not null")
Word.inittable(word="unique not null")
modified="not null default CURRENT_TIMESTAMP",

Because we can reuse the entity and relation modules we developed earlier, the actual implementation of the database layer is straightforward (full code is available as After importing both modules, we first define a subclass of Entity for each entity we identified in our data model. All these classes are used as is, so they have only a pass statement as their body.

Likewise, we define a subclass of Relation for each relation we need to implement in our wiki application.

All these Entity and Relation subclasses still need the initialization code to be called once each time the application starts and that is where the convenience function initdb() comes in. It bundles the initialization code for each entity and relation (highlighted).

Many entities we define here are simple but a few warrant a closer inspection. The Page entity contains a modified column that has a non null constraint. It also has a default: CURRENT_TIMESTAMP (highlighted). This default is SQLite specific (other database engines will have other ways of specifying such a default) and will initialize the modified column to the current date and time if we create a new Page record without explicitly setting a value.

The Image entity also has a definition that is a little bit different: its data column is explicitly defined to have a blob affinity. This will enable us to store binary data without any problem in this table, something we need to store and retrieve the binary data contained in an image. Of course, SQLite will happily store anything we pass it in this column, but if we pass it an array of bytes (not a string that is), that array is stored as is.


The delivery layer

With the foundation, that is, the data layer in place, we build on it when we develop the delivery layer. Between the delivery layer and the database layer, there is an additional layer that encapsulates the domain-specific knowledge (that is, it knows how to verify that the title of a new Topic entity conforms to the requirements we set for it before it stores it in the database):

Python 3 Web Development

Each different layer in our application is implemented in its own file or files. It is easy to get confused, so before we delve further into these files, have a look at the following table. It lists the different files that together make up the wiki application and refers to the names of the layers.

Python 3 Web Development

We'll focus on the main CherryPy application first to get a feel for the behavior of the application.


Time for action – implementing the opening screen

The opening screen of the wiki application shows a list of all defined topics on the right and several ways to locate topics on the left. Note that it still looks quite rough because, at this point, we haven't applied any style sheets:

Python 3 Web Development

Let us first take a few steps to identify the underlying structure. This structure is what we would like to represent in the HTML markup:

  • Identify related pieces of information that are grouped together. These form the backbone of a structured web page. In this case, the search features on the left form a group of elements distinct from the list of topics on the right.
  • Identify distinct pieces of functionality within these larger groups. For example, the elements (input field and search button) that together make up the word search are such a piece of functionality, as are the tag search and the tag cloud.
  • Try to identify any hidden functionality, that is, necessary pieces of information that will have to be part of the HTML markup, but are not directly visible on a page. In our case, we have links to the jQuery and JQuery UI JavaScript libraries and links to CSS style sheets.

Identifying these distinct pieces will not only help to put together HTML markup that reflects the structure of a page, but also help to identify necessary functionality in the delivery layer because each of these functional pieces is concerned with specific information processed and produced by the server.

What just happened?

Let us look in somewhat more detail at the structure of the opening page that we identified.

Most notable are three search input fields to locate topics based on words occurring in their bodies, based on their actual title or based on tags associated with a topic. These search fields feature auto complete functionality that allows for comma-separated lists. In the same column, there is also room for a tag cloud, an alphabetical list of tags with font sizes dependent on the number of topics marked with that tag.

The structural components

The HTML markup for this opening page is shown next. It is available as the file basepage.html and the contents of this file are served by several methods in the Wiki class implementing the delivery layer, each with a suitable content segment. Also, some of the content will be filled in by AJAX calls, as we will see in a moment:


<link rel="stylesheet"
type="text/css" media="all" />
<link rel="stylesheet" href="/wiki.css"
type="text/css" media="all" />

<div id="navigation">
<div class="navitem">
<a href="./">Wiki Home</a>
<div class="navitem">
<span class="label">Search topic</span>
<form id="topicsearch">
<input type="text" >
<button type="submit" >Search</button>
<div class="navitem">
<span class="label">Search word</span>
<form id="wordsearch">
<input type="text" >
<button type="submit" >Search</button>
<div class="navitem">
<span class="label">Search tag</span>
<form id="tagsearch">
<input type="text" >
<button type="submit" >Search</button>
<div class="navitem">
<p id="tagcloud">Tag cloud</p>
<div id="content">%s</div>
<script src="/wikiweb.js" type="text/javascript"></script>

The <head> element contains both links to CSS style sheets and <script> elements that refer to the jQuery libraries. This time, we choose again to retrieve these libraries from a public content delivery network.

The highlighted lines show the top-level <div> elements that define the structure of the page. In this case, we have identified a navigation part and a content part and this is reflected in the HTML markup.

Enclosed in the navigation part are the search functions, each in their own <div> element. The content part contains just an interpolation placeholder %s for now, that will be filled in by the method that serves this markup. Just before the end of the body of the markup is a final <script> element that refers to a JavaScript file that will perform actions specific to our application and we will examine those later.

The application methods

The markup from the previous section is served by methods of the Wiki class, an instance of which class can be mounted as a CherryPy application. The index() method, for example, is where we produce the markup for the opening screen (the complete file is available as and contains several other methods that we will examine in the following sections):


def index(self):
item = '<li><a href="show?topic=%s">%s</a></li>'
topiclist = "\n".join(
[item%(t,t)for t in wiki.gettopiclist()])
content = '<div id="wikihome"><ul>%s</ul></div>'%(
return basepage % content

First, we define the markup for every topic we will display in the main area of the opening page (highlighted). The markup consists of a list item that contains an anchor element that refers to a URL relative to the page showing the opening screen. Using relative URLs allows us to mount the class that implements this part of the application anywhere in the tree that serves the CherryPy application. The show() method that will serve this URL takes a topic parameter whose value is interpolated in the next line for each topic that is present in the database.

The result is joined to a single string that is interpolated into yet another string that encapsulates all the list items we just generated in an unordered list (a <ul> element in the markup) and this is finally returned as the interpolated content of the basepage variable.

In the definition of the index() method, we see a pattern that will be repeated often in the wiki application: methods in the delivery layer, like index(), concern themselves with constructing and serving markup to the client and delegate the actual retrieval of information to a module that knows all about the wiki itself. Here the list of topics is produced by the wiki.gettopiclist() function, while index() converts this information to markup. Separation of these activities helps to keep the code readable and therefore maintainable.


Time for action – implementing a wiki topic screen

When we request a URL of the form show?topic=value, this will result in calling the show() method. If value equals an existing topic, the following (as yet unstyled) screen is the result:

Python 3 Web Development

Just as for the opening screen, we take steps to:

  • Identify the main areas on screen
  • Identify specific functionality
  • Identify any hidden functionality

The page structure is very similar to the opening screen, with the same navigational items, but instead of a list of topics, we see the content of the requested topic together with some additional information like the tags associated with this subject and a button that may be clicked to edit the contents of this topic. After all, collaboratively editing content is what a Wiki is all about.

We deliberately made the choice not to refresh the contents of just a part of the opening screen with an AJAX call, but opted instead for a simple link that replaces the whole page. This way, there will be an unambiguous URL in the address bar of the browser that will point at the topic. This allows for easy bookmarking. An AJAX call would have left the URL of the opening screen that is visible in the address bar of the browser unaltered and although there are ways to alleviate this problem, we settle for this simple solution here.

What just happened?

As the main structure we identified is almost identical to the one for the opening page, the show() method will reuse the markup in basepage.html.


def show(self,topic):
topic = topic.capitalize()
currentcontent,tags = wiki.gettopic(topic)
currentcontent = "".join(wiki.render(currentcontent))
tags = ['<li><a href="searchtags?tags=%s">%s</a></li>'%(
t,t) for t in tags]
content = '''
<h1>%s</h1><a href="edit?topic=%s">Edit</a>
<div id="wikitopic">%s</div>
<div id="wikitags"><ul>%s</ul></div>
<div id="revisions">revisions</div>
''' % ( topic, topic, currentcontent,"\n".join(tags))
return basepage % content

The show() method delegates most of the work to the wiki.gettopic() method (highlighted) that we will examine in the next section and concentrates on creating the markup it will deliver to the client. wiki.gettopic() will return a tuple that consists of both the current content of the topic and a list of tags.

Those tags are converted to <li> elements with anchors that point to the searchtags URL. This list of tags provides a simple way for the reader to find related topics with a single click. The searchtags URL takes a tags argument so a single <li> element constructed this way may look like this: <li><a href="searchtags?tags=Python">Python</a></li>.

The content and the clickable list of tags are embedded in the markup of the basepage together with an anchor that points to the edit URL. Later, we will style this anchor to look like a button and when the user clicks it, it will present a page where the content may be edited.



        Read more about this book      

(For more resources on Python, see here.)

Time for action – editing wiki topics

In the previous section, we showed how to present the user with the contents of a topic but a wiki is not just about finding topics, but must present the user with a way to edit the content as well. This edit screen is presented in the following screenshot:

Python 3 Web Development

Besides the navigation column on the left, within the edit area, we can point out the following functionality:

  • Elements to alter the title of the subject.
  • Modify the tags (if any) associated with the topic.
  • A large text area to edit the contents of the topic. On the top of the text area, we see a number of buttons that can be used to insert references to other topics, external links, and images.
  • A Save button that will submit the changes to the server.

What just happened?

The edit() method in is responsible for showing the edit screen as well as processing the information entered by the user, once the save button is clicked:


def edit(self,topic,
user = self.logon.checkauth(
logonurl=self.logon.path, returntopage=True)
if content is None :
currentcontent,tags = wiki.gettopic(topic)
html = '''
<div id="editarea">
<form id="edittopic" action="edit"
<label for="topic"></label>
<input name="originaltopic"
type="hidden" value="%s">
<input name="topic" type="text"
<div id="buttonbar">
<button type="button"
External link
<button type="button"
Wiki page
<button type="button"
<label for="content"></label>
<textarea name="content"
cols="72" rows="24" >
<label for="tags"></label>
<input name="tags" type="text"
<button type="submit">Save</button>
<button type="button">Cancel</button>
<button type="button">Preview</button>
<div id="previewarea">preview</div>
<div id="imagedialog">%s</div>
'''%(topic, topic, currentcontent,
", ".join(tags),
return basepage % html
else :
raise cherrypy.HTTPRedirect('show?topic='+topic)

The first priority of the edit() method is to verify that the user is logged in as we want only known users to edit the topics. By setting the returntopage parameter to true, the checkauth() method will return to this page once the user is authenticated.

The edit() method is designed to present the edit screen for a topic as well as to process the result of this editing when the user clicks the Save button and therefore takes quite a number of parameters.

The distinction is made based on the content parameter. If this parameter is not present (highlighted), the method will produce the markup to show the various elements in the edit screen. If the content parameter is not equal to None, the edit() method was called as a result of submitting the content of the form presented in the edit screen, in which case, we delegate the actual update of the content to the wiki.updatetopic() method. Finally, we redirect the client to a URL that will show the edited content again in its final form without the editing tools.

At this point, you may wonder what all this business is about with both a topic and an originaltopic parameter. In order to allow the user to change the title of the topic while that title is also used to find the topic entity that we are editing, we pass the title of the topic as a hidden variable in the edit form, and use this value to retrieve the original topic entity, a ploy necessary because, at this point, we may have a new title and yet have to find the associated topic that still resides in the database with the old title.

Cross Site Request Forgery
When we process the data sent to the edit() function we make sure that only authenticated users submit anything. Unfortunately, this might not be enough if the user is tricked into sending an authenticated request on behalf of someone else. This is called Cross Site Request Forgery (CSRF) and although there are ways to prevent this, these methods are out of scope for this example. Security conscious people should read up on these exploits, however, and a good place to start is and for Python-specific discussions


Additional functionality

In the opening screen as well as in the pages showing the content of topics and in the editing page, there is a lot of hidden functionality. We already encountered several functions of the wiki module and we will examine them in detail in this section together with some JavaScript functionality to enhance the user interface.

Time for action – selecting an image

On the page that allows us to edit a topic, we have half hidden an important element: the dialog to insert an image. If the insert image button is clicked, a dialog is present, as shown in the following image:

Python 3 Web Development

Because a dialog is, in a way, a page of its own, we take the same steps to identify the functional components:

  • Identify the main structure
  • Identify specific functional components
  • Identify hidden functionality

The dialog consists of two forms. The top one consists of an input field that can be used to look for images with a given title. It will be augmented with jQuery UI's auto complete functionality.

The second form gives the user the possibility to upload a new file while the rest of the dialog is filled with any number of images. Clicking on one of the images will close the dialog and insert a reference to that image in the text area of the edit page. It is also possible to close the dialog again without selecting an image by either clicking the small close button on the top-right or by pressing the Escape key.

What just happened ?

The whole dialog consists of markup that is served by the images() method.


def images(self,title=None,description=None,file=None):
if not file is None:
data =
yield '''
<label for="title">select a title</label>
<input name="title" type="text">
<button type="submit">Search</button>
<form method="post" action="./images"
<label for="file">New image</label>
<input type="file" name="file">
<label for="title">Title</label>
<input type="text" name="title">
<label for="description">Description</label>
<textarea name="description"
cols="48" rows="3"></textarea>
<button type="submit">Upload</button>
yield '<div id="imagelist">\n'
for img in self.getimages():
yield img
yield '</div>'

There is some trickiness here to understand well: from the edit() method, we call this images() method to provide the markup that we insert in the page that is delivered to the client requesting the edit URL, but because we have decorated the images() method with a @cherrypy.expose decorator, the images() method is visible from the outside and may be requested with the images URL. If accessed that way, CherryPy will take care of adding the correct response headers.

Being able to call this method this way is useful in two ways: because the dialog is quite a complex page with many elements, we may check how it looks without being bothered by it being part of a dialog, and we can use it as the target of the form that is part of the images dialog and that allows us to upload new images. As with the edit() method, the distinction is again made based on a whether a certain parameter is present. The parameter that serves this purpose is file and will contain a file object if this method is called in response to an image being submitted (highlighted).

The file object is a cherrypy.file object, not a Python built in file object, and has several attributes, including an attribute called file that is a regular Python stream object. This Python stream object serves as an interface to a temporary file that CherryPy has created to store the uploaded file. We can use the streams read() method to get at its content.

Sorry about all the references to file, I agree it is possibly a bit confusing. Read it twice if needed and relax. This summary may be convenient:
This item has a which is a
The images() method file parameter herrypy.file object
A cherrypy.file object file attribute Python stream object
A Python stream object name attribute name of a file on disk
The Python stream can belong to a number of classes where all implement the same API. Refer to for details on Python streams.

The cherrypy.file also has a content_type attribute whose string representation we use together with the title and the binary data to create a new Image instance.

The next step is to present the HTML markup that will produce the dialog, possibly including the uploaded image. This markup contains two forms.

The first one (highlighted in the previous code snippet) consists of an input field and a submit button. The input field will be augmented with auto complete functionality as we will see when we examine wikiweb.js. The submit button will replace the selection of images when clicked. This is also implemented in wikiweb.js by adding a click handler that will perform an AJAX call to the getimages URL.

The next form is the file upload form. What makes it a file upload form is the <input> element of the type file (highlighted). Behind the scenes, CherryPy will store the contents of a file type <input> element in a temporary file and pass it to the method servicing the requested URL by submitting the form.

There is a final bit of magic to pay attention to: we insert the markup for the dialog as part of the markup that is served by the edit() method, yet the dialog only shows if the user clicks the insert image button. This magic is performed by jQuery UI's dialog widget and we convert the <div> element containing the dialog's markup by calling its dialog method, as shown in this snippet of markup served by the edit() method():


By setting the autoOpen option to false, we ensure that the dialog remains hidden when the page is loaded, after all, the dialog should only be opened if the user clicks the insert image button.

Opening the dialog is accomplished by several pieces of JavaScript (full code available as wikiweb.js). The first piece associates a click handler with the insert image button that will pass the open option to the dialog, causing it to display itself:



Note that the default action of a dialog is to close itself when the Escape key is pressed, so we don't have to do anything about that.

Within the dialog, we have to configure the images displayed there to insert a reference in the text area when clicked and then close the dialog. We do this by configuring a live handler for the click event. A live handler will apply to elements that match the selector (in this case, images with the selectable-image class) even if they are not present yet. This is crucial, as we may upload new images that are not yet present in the list of images shown when the dialog is first loaded:


var insert = "<" + $(this).attr("id").substring(3) + "," +
$(this).attr("alt") + ">";
var Area = $("#edittopic textarea");
var area = Area[0];
var oldposition = Area.getCursorPosition();
var pre = area.value.substring(0, oldposition);
var post = area.value.substring(oldposition);
area.value = pre + insert + post;
Area.focus().setCursorPosition(oldposition + insert.length);

The first activity of this handler is to close the dialog. The next step is to determine what text we would like to insert into the text area (highlighted). In this case, we have decided to represent a reference to an image within the database as a number followed by a description within angled brackets. For example, image number 42 in the database might be represented as <42,"Picture of a shovel">. When we examine the render() method in, we will see how we will convert this angled bracket notation to HTML markup.

The remaining part of the function is concerned with inserting this reference into the <textarea> element. We therefore retrieve the jQuery object that matches our text area first (highlighted) and because such a selection is always an array and we need access to the underlying JavaScript functionality of the <textarea> element, we fetch the first element.

The value attribute of a <textarea> element holds the text that is being edited and we split this text into a part before the cursor position and a part after it and then combine it again with our image reference inserted. We then make sure the text area has the focus again (which might have shifted when the user was using the dialog) and position the cursor at a position that is just after the newly inserted text.


Time for action – implementing a tag cloud

One of the distinct pieces of functionality we identified earlier was a so called tag cloud.

Python 3 Web Development

The tag cloud that is present in the navigation section of all pages shows an alphabetically sorted list of tags. The styling of the individual tags represents the relative number of topics that are marked with this tag. Clicking on the tags will show the list of associated topics. In this implementation, we vary just the font size but we could have opted for additional impact by varying the color as well.

Before we implement a tag cloud, we should take a step back and take a good look at what we need to implement:

  • We need to retrieve a list of tags
  • We need to sort them
  • We need to present markup. This markup should contain links that will refer to a suitable URL that will represent a list of topics that are marked with this tag. Also, this markup must in some way indicate what the relative number of topics is that have this tag so it can be styled appropriately.

The last requirement is again a matter of separating structure from representation. It is easier to adapt a specific style by changing a style sheet than to alter structural markup.

What just happened?

If we look at the HTML that represents an example tag cloud, we notice that the tags are represented by <span> elements with a class attribute that indicates its weight. In this case, we divide the range of weights in five parts, giving us classes from weight0 for the least important tag to weight4 for the most important one:

<span class="weight1"><a href="searchtags?tags=Intro">Intro</a></span>
<span class="weight1"><a href="searchtags?tags=Main">Main</a></span>
<span class="weight4"><a href="searchtags?tags=Python">Python</a></
<span class="weight2"><a href="searchtags?tags=Tutorial">Tutorial</

The actual font size we use to represent these weights is determined by the styles in wiki.css:

.weight0 { font-size:60%; }
.weight1 { font-size:70%; }
.weight2 { font-size:80%; }
.weight3 { font-size:90%; }
.weight4 { font-size:100%; }

The tag cloud itself is delivered by the tagcloud() method in


def tagcloud(self,_=None):
for tag,weight in wiki.tagcloud():
yield '''
<span class="weight%s">
<a href="searchtags?tags=%s">%s</a>

This method iterates over all tuples retrieved from wiki.tagcloud() (highlighted). These tuples consist of a weight and a tag name and these are transformed to links and encapsulated in a <span> element with a fitting class attribute:


def tagcloud():
tags = sorted([wikidb.Tag(id=t) for t in wikidb.Tag.list()],
tagrank = []
for t in tags:
topics = wikidb.TopicTag.list(t)
if len(topics):
totaltopics += len(topics)
maxtopics = max(topics for tag,topics in tagrank)
for tag,topics in tagrank:
yield tag, int(5.0*topics/(maxtopics+1)) # map to 0 - 4

The tagcloud() function in starts off by retrieving a list of all Tag objects and sorts them based on their tag attribute. Next, it iterates over all these tags and retrieves their associated topics (highlighted). It then checks if there really are topics by checking the length of the list of topics. Some tags may not have any associated topics and are not counted in this ranking operation.

When a tag is removed from a topic, we do not actually delete the tag itself if it no longer has any associated topics. This might lead to a buildup of unused tags and, if necessary, you might want to implement some clean-up scheme.

If a tag does have associated topics, the number of topics is added to the total and a tuple consisting of the tag name and the number of topics is appended to the tagrank list. Because our list of Tag objects was sorted, tagrank will be sorted as well when we have finished counting the topics.

In order to determine the relative weight of the tags, we iterate again, this time over the tagrank list to find the maximum number of topics associated with any tag. Then, in a final iteration, we yield a tuple consisting of the tag name and it relative weight, where the relative weight is computed by dividing the number of topics by the maximum number we encountered (plus one, to prevent divide by zero errors). This weight will then be between zero and one (exclusive) and by multiplying this by 5 and rounding down to an integer, a whole number between 0 and 4 (inclusive) is obtained.



        Read more about this book      

(For more resources on Python, see here.)

Time for action – searching for words

To be able to find a list of all topics which contain one or more specific words, we present the user with a search form in the navigation area. These are some of the considerations when designing such a form:

  • The user must be able to enter more than one word to find topics with all those words in their content
  • Searching should be case insensitive
  • Locating those topics should be fast even if we have a large number of topics with lots of text
  • Auto completion would be helpful to aid the user in specifying words that are actually part of the content of some topic

All these considerations will determine how we will implement the functionality in the delivery layer and on the presentation side.

What just happened?

The search options in the navigation area and the tag entry field in the edit screen all feature autocomplete functionality.

With the word and tag search fields in the wiki application, we would like to go one step further. Here we would like to have auto completion on the list of items separated by commas. The illustrations show what happens if we type a single word and what happens when a second word is typed in:

Python 3 Web Development

We cannot simply send the list of items complete with commas to the server because in that case we could not impose a minimum character limit. It would work for the first word of course, but once the first word is present in the input field, each subsequent character entry would result in a request to the server whereas we would like this to happen when the minimum character count for the second word is reached.

Fortunately, the jQuery UI website already shows an example of how to use the autocomplete widget in exactly this situation (check the example at As this online example is fairly well explained in its comments, we will not list it here, but note that the trick lies in the fact that instead of supplying the autocomplete widget with just a source URL, it is also given a callback function that will be invoked instead of retrieving information directly. This callback has access to the string of comma-separated items in the input field and can call the remote source with just the last item in the list.

On the delivery side, the word search functionality is represented by two methods. The first one is the getwords() method in


def getwords(self,term,_=None):
term = term.lower()
return json.dumps(
[t for t in wikidb.Word.getcolumnvalues('word')
if t.startswith(term)])

getwords() will return a list of words that starts with the characters in the term argument and returns those as a JSON serialized string for use by the auto completion function that we will add to the input field of the word search form. Words are stored all lowercase in the database. Therefore, the term argument is lowercased as well before matching any words (highlighted). Note that the argument to json.dumps() is in square brackets to convert the generator returned by the list comprehension to a list. This is necessary because json.dumps does not accept generators.

The second method is called searchwords(), which will return a list of clickable items consisting of those topics that contain all words passed to it as a string of comma-separated words. The list will be alphabetically sorted on the name of the topic:


def searchwords(self,words):
yield '<ul>\n'
for topic in sorted(wiki.searchwords(words)):
yield '<li><a href="show?topic=%s">%s</a></li>'%(
yield '</ul>\n'

Note that the markup returned by searchwords() is not a complete HTML page, as it will be called asynchronously when the user clicks the search button and the result will replace the content part.

Again, the hard work of actually finding the topics that contain the words is not done in the delivery layer, but delegated to the function wiki.searchwords():


def searchwords(words):
topics = None
for word in words.split(','):
word = word.strip('.,:;!? ').lower() # a list with a final
comma will yield an empty last term
if word.isalnum():
w = list(wikidb.Word.list(word=word))
if len(w):
ww = wikidb.Word(id=w[0])
wtopic = set( w.a_id for w in wikidb.
TopicWord.list(ww) )
if topics is None :
topics = wtopic
topics &= wtopic
if len(topics) == 0 :
if not topics is None:
for t in topics:
yield wikidb.Topic(id=t).title

This searchwords() function starts by splitting the comma-separated items in its word argument and sanitizing each item by stripping, leading, and trailing punctuation and whitespace and converting it to lowercase (highlighted).

The next step is to consider only items that consist solely of alphanumeric characters because these are the only ones stored as word entities to prevent pollution by meaningless abbreviations or markup.

We then check whether the item is present in the database by calling the list() method of the Word class. This will return either an empty list or a list containing just a single ID. In the latter case, this ID is used to construct a Word instance and we use that to retrieve a list of Topic IDs associated with this word by calling the list() method of the TopicWord class (highlighted) and convert it to a set for easy manipulation.

If this is the first word we are checking, the topics variable will contain None and we simply assign the set to it. If the topic variable already contains a set, we replace the set by the intersection of the stored set and the set of topic IDs associated with the word we are now examining. The intersection of two sets is calculated by the & operator (in this case, replacing the left-hand side directly, hence the &= variant). The result of the intersection will be that we have a set of topic IDs of topics that contain all words examined so far.

If the resulting set contains any IDs at all, these are converted to Topic instances to yield their title attribute.


The importance of input validation

Anything that is passed as an argument to the methods that service the wiki application, can potentially damage the application. This may sound a bit pessimistic, but remember that when designing an application, you cannot rely on the goodwill of the public, especially when the application is accessible over the Internet and your public may consist of dimwitted search bots or worse.

We may limit the risks by granting the right to edit a page only to people we know by implementing some sort of authentication scheme, but we don't want even these people to mess up the appearance of a topic by inserting all sorts of HTML markup, references to images that do not exist or even malicious snippets of JavaScript. We therefore want to get rid of any unwanted HTML elements present in the content before we store it in the database, a process generally known as scrubbing.

Preventing Cross-Site Scripting (XSS) (as the inclusion of unwanted code in web pages is called) is covered in depth on this webpage:


Time for action – scrubbing your content

Many wikis do not allow any HTML markup at all, but use simpler markup methods to indicate bulleted lists, headers, and so on.

Check for examples of possible markup schemes, for example, markdown, REST, or for markup that does allow some HTML–the mediawiki software at

Consider the following:

  • Will the user understand some HTML markup or opt for no HTML markup at all?
  • What will the wiki contain? Just text or also external references or references to binary objects (like images) stored in the wiki?

For this wiki, we will implement a mixed approach. We will allow some HTML markup like <b> and <ul> but not any links. References to topics in the wiki might be entered as [Topic], whereas links to external pages might be denoted as {}. Images stored in the wiki may be referred to as <143>. Each type of reference will take an optional description as well. Example markup, as entered by the user, is shown next:

This topic is tried with a mix of legal and illegal markup.
A <b>list</b> is fine:
A link using an html tag referring to a <a href="http://www.example.
com" target="blank">nasty popup</a>.
A legal link uses braces {, "A link"}

When viewed, it will look like the following image:

Python 3 Web Development

What just happened?

When we encountered the edit() method in, we saw that the actual update of the content of a topic was delegated to the updatetopic() function in, so let's have a look at how this function is organized:


def updatetopic(originaltopic,topic,content,tags):
if len(t) == 0 :
# update word index
newwords = set(splitwords(content))
wordlist = wikidb.TopicWord.list(t)
topicwords = { wikidb.Word(id=w.b_id).word:w
for w in wordlist }
# update tags
newtags = set(t.capitalize()
for t in [t.strip()
for t in tags.split(',')] if
taglist = wikidb.TopicTag.list(t)
topictags = { wikidb.Tag(id=t.b_id).tag:t
for t in taglist }

First it checks whether the topic already exists by retrieving a list of Topic objects that have a title attribute that matches the originaltopic parameter. If this list is empty, it creates a new topic (highlighted), otherwise we update the title attribute of the first matching topic found.

Then it calls the scrub() function to sanitize the content and then creates a new Page instance to store this content and associates it with the Topic instance t. So every time we update the content, we create a new revision and old revisions are still available for comparison.

The next step is to update the list of words used in the topic. We therefore create a set of unique words by passing the content to the splitwords() function (not shown here, available in and converting the list of words to a set. Converting a list to a set will remove any duplicate items.

We convert the set of words to a dictionary with Word objects as keys and the words themselves as values and call the updateitemrelation() function to perform the update.

The same scenario is used with any tags associated with the topic. The updateitemrelation() function may look intimidating, but that is mainly due to the fact that it is made general enough to deal with any Relation, not just one between Topic and Word or Topic and Tag. By designing a general function, we have less code to maintain which is good although, in this case, readability may have suffered too much.


def updateitemrelation(p,itemmap,newitems,Entity,attr,Relation):
olditems = set()
for item in itemmap:
if not item in newitems:
for item in newitems - olditems:
if not item in itemmap:
ilist = list(Entity.list(**{attr:item}))
if (len(ilist)):
i = Entity(id=ilist[0])
i = Entity(**{attr:item})

First we determine if any items currently associated with the primary entity p are not in the new list of items. If so, they are deleted, that is, the recorded relation between the primary entity and the item is removed from the database, otherwise we store them in the olditems set.

The next step determines the difference between the newitems and olditmes (highlighted). The result represents those items that have to be associated with the primary entity, but may not yet be stored in the database. This is determined by using the list() method to find any, and if no entity is found, to create one. Finally, we add a new relation between the primary entity and the item.

The scrub() method is used to remove any HTML tags from the content that are not explicitly listed as being allowed:


def scrub(content):
parser = Scrubber(('ul','ol','li','b','i','u','em','code','pre','h
return "".join(parser.result)

For this purpose, it instantiates a Scrubber object with a very limited list of allowable tags (highlighted) and feeds the content to its feed() method. The result is then found in the result attribute of the Scrubber instance:


class Scrubber(HTMLParser):
def __init__(self,allowed_tags=[]):
self.result = []
self.allowed_tags = set(allowed_tags)
def handle_starttag(self, tag, attrs):
if tag in self.allowed_tags:
self.result.append('<%s %s>'%(tag,
" ".join('%s="%s"'%a for a in attrs)))
def handle_endtag(self, tag):
if tag in self.allowed_tags:
def handle_data(self,data):

The Scrubber class is a subclass of the HTMLParser class provided in Python's html.parser module. We override suitable methods here to deal with start and end tags and data and ignore the rest (like processing instructions and the like). Both beginning and end tags are only appended to the result if they are present in the list of allowable tags. Regular data (text, that is) is simply appended to the result.


Time for action – rendering content

We added specific JavaScript functionality to the text area editor to insert references to external websites, other wiki topics, and wiki images in a format that we devised ourselves and that cannot be interpreted as HTML. Now we have to provide code that will convert this notation to something that will be understood by the client.

What just happened?

Recognizing those items that we have to convert to HTML is mostly done by using regular expressions. We therefore define three regular expressions first, each representing a distinct pattern. Note that we use raw strings here to prevent interpretation of backslashes. Backslashes are meaningful in regular expression, and if we didn't use raw strings, we would have to escape each and every backslash with a backslash, resulting in an unreadable sea of backslashes:


topicref = re.compile(r'\[\s*([^,\]]+?)(\s*,\s*([^\]]+))?\s*\]')
linkref = re.compile(r'\{\s*([^,\}]+?)(\s*,\s*([^\}]+))?\s*\}')
imgref = re.compile(r'\<\s*(\d+?)(\s*,\s*([^\>]*))?\s*\>')

For more on Python regular expressions have a look at

Next we define three utility functions, one for each pattern. Each function takes a match object that represents a matching pattern and returns a string that can be used in HTML to show or link to that reference:


def topicrefreplace(matchobj): if (not
is None) else
nonexist = ""
nonexist = " nonexisting"
return '<a href="show?topic=%s" class="topicref%s">%s</a>'%(
def linkrefreplace(matchobj): if (not
is None) else
return '<a href="%s" class="externalref">%s</a>'%(ref,txt)
def imgrefreplace(matchobj): if (not
is None) else
return '''<img src="showimage?id=%s" alt="%s"
def render(content):
yield '<p>\n'
for line in content.splitlines(True):
line = re.sub(imgref ,imgrefreplace ,line)
line = re.sub(topicref,topicrefreplace,line)
line = re.sub(linkref ,linkrefreplace ,line)
if len(line.strip())==0 : line = '</p>\n<p>'
yield line
yield '</p>\n'

The render() function is passed a string with content to convert to HTML. For each line in the content, it tries to find the predefined patterns and converts them by passing the appropriate function to the re.sub() method. If a line consists of whitespace only, suitable HTML is produced to end a paragraph (highlighted).


We learned a lot in this article about implementing a web application that consists of more than a few entities and their relations.

Specifically, we covered:

  • How to create a data model that describes entities and relations accurately
  • How to create a delivery layer that is security conscious and treats incoming data with care
  • How to use jQuery UI's dialog widget and extend the functionality of the autocomplete widget

Further resources on this subject:

You've been reading an excerpt of:

Python 3 Web Development Beginner's Guide

Explore Title