Python Data Persistence using MySQL Part II: Moving Data Processing to the Data

Exclusive offer: get 50% off this eBook here
Expert Python Programming

Expert Python Programming — Save 50%

Best practices for designing, coding, and distributing your Python software

$26.99    $13.50
by Yuli Vasiliev | December 2008 | Open Source

Continuing with the sample discussed in the Python Data Persistence using MySQL article, this Part II: Moving Data Processing to the Data by Yuli Vasiliev explains how you can implement some data processing inside your database, resulting in better application design and better performance in the long run. In this article, you will look at triggers in action. Stored procedures and functions can be used similarly.

{literal}

To move data processing to the data, you can use stored procedures, stored functions, and triggers. All these components are implemented inside the underlying database, and can significantly improve performance of your application due to reducing network overhead associated with multiple calls to the database. It is important to realize, though, the decision to move any piece of processing logic into the database should be taken with care. In some situations, this may be simply inefficient. For example, if you decide to move some logic dealing with the data stored in a custom Python list into the database, while still keeping that list implemented in your Python code, this can be inefficient in such a case, since it only increases the number of calls to the underlying database, thus causing significant network overhead. To fix this situation, you could move the list from Python into the database as well, implementing it as a table.

Starting with version 5.0, MySQL supports stored procedures, stored functions, and triggers, making it possible for you to enjoy programming on the underlying database side. In this article, you will look at triggers in action. Stored procedures and functions can be used similarly.

Planning Changes for the Sample Application

Assuming you have followed the instructions in Python Data Persistence using MySQL, you should already have the application structure to be reorganized here. To recap, what you should already have is:

  • tags nested list of tags used to describe the posts obtained from the Packt Book Feed page.
  • obtainPost function obtains the information about the most recent post on the Packt Book Feed page.
  • determineTags function determines tags appropriate to the latest post obtained from the Packt Book Feed page.
  • insertPost function inserts the information about the obtained post into the underlying database tables: posts and posttags.
  • execPr function brings together the functionality of the described above functions.

That’s what you should already have on the Python side. And on the database side, you should have the following components:

  • posts table contains records representing posts obtained from the Packt Book Feed page.
  • posttags table contains records each of which represents a tag associated with a certain post stored in the posts table.

Let’s figure out how we can refactor the above structure, moving some data processing inside the database.

The first thing you might want to do is to move the tags list from Python into the database, creating a new table tags for that. Then, you can move the logic implemented with the determineTags function inside the database, defining the AFTER INSERT trigger on the posts table. From within this trigger, you will also insert rows into the posttags table, thus eliminating the need to do it from within the insertPost function.

Once you’ve done all that, you can refactor the Python code implemented in the appsample module. To summarize, here are the steps you need to perform in order to refactor the sample application discussed in the earlier article:

  • Create tags table and populate it with the data currently stored in the  tags list implemented in Python.
  • Define the AFTER INSERT trigger on the posts table.
  • Refactor the insertPost function in the appsample.py module.
  • Remove the tags list from the appsample.py module.
  • Remove the determineTags function from the appsample.py module.
  • Refactor the execPr function in the appsample.py module.

Refactoring the Underlying Database

To keep things simple, the tags table might contain a single column tag with the primary key constraint defined on it. So, you can create the tags table as follows:

CREATE TABLE tags (
tag VARCHAR(20) PRIMARY KEY
)
ENGINE = InnoDB;

Then, you might want to modify the posttags table, adding a foreign key constraint to its tag column. Before you can do that, though, you will need to delete all the rows from this table. This can be done with the following query:

DELETE FROM posttags;

Now you can move on and alter posttags as follows:

ALTER TABLE posttags ADD FOREIGN KEY (tag) REFERENCES tags(tag);

The next step is to populate the tags table. You can automate this process with the help of the following Python script:

>>> import MySQLdb
>>> import appsample
>>> db=MySQLdb.connect(host="localhost",user="usrsample",passwd="pswd",db=">>> dbsample")
>>> c=db.cursor()
>>> c.executemany("""INSERT INTO tags VALUES(%s)""", appsample.tags)
>>> db.commit()
>>> db.close()

As a result, you should have the tags table populated with the data taken from the tags list discussed in Python Data Persistence using MySQL. To make sure it has done so, you can turn back to the mysql prompt and issue the following query against the tags table:

SELECT * FROM tags;

The above should output the list of tags you have in the tags list. Of course, you can always extend this list, adding new tags with the INSERT statement. For example, you could issue the following statement to add the Visual Studio tag:

INSERT INTO tags VALUES('Visual Studio');

Now you can move on and define the AFTER INSERT trigger on the posts table:

delimiter //
CREATE TRIGGER insertPost AFTER INSERT ON posts
FOR EACH ROW BEGIN
INSERT INTO posttags(title, tag) SELECT NEW.title as title, tag FROM tags WHERE LOCATE(tag, NEW.title)>0;
END
//
delimiter ;

As you can see, the posttags table will be automatically populated with appropriate tags just after a new row is inserted into the posts table. Notice the use of the INSERT … SELECT statement in the body of the trigger. Using this syntax lets you insert several rows into the posttags table at once, without having to use an explicit loop. In the WHERE clause of SELECT, you use standard MySQL string function LOCATE returning the position of the first occurrence of the substring, passed in as the first argument, in the string, passed in as the second argument. In this particular example, though, you are not really interested in obtaining the position of an occurrence of the substring in the string. All you need to find out here is whether the substring appears in the string or not. If it is, it should appear in the posttags table as a separate row associated with the row just inserted into the posts table.

Refactoring the Sample’s Python Code

Now that you have moved some data and data processing from Python into the underlying database, it’s time to reorganize the appsample custom Python module created as discussed in Python Data Persistence using MySQL. As mentioned earlier, you need to rewrite the insertPost and execPr functions and remove the determineTags function and the tags list.

This is what the appsample module should look like after revising:

import MySQLdb
import urllib2
import xml.dom.minidom

def obtainPost():
addr = "http://feeds.feedburner.com/packtpub/sDsa?format=xml"
xmldoc = xml.dom.minidom.parseString(urllib2.urlopen(addr).read())
item = xmldoc.getElementsByTagName("item")[0]
title = item.getElementsByTagName("title")[0].firstChild.data
guid = item.getElementsByTagName("guid")[0].firstChild.data
pubDate = item.getElementsByTagName("pubDate")[0].firstChild.data
post ={"title": title, "guid": guid, "pubDate": pubDate}
return post

def insertPost(title, guid, pubDate):
db=MySQLdb.connect(host="localhost",user="usrsample",passwd="pswd",db="dbsample")
c=db.cursor()
c.execute("""INSERT INTO posts (title, guid, pubDate) VALUES(%s,%s,%s)""", (title, guid, pubDate))
db.commit()
db.close()

def execPr():
p = obtainPost()
insertPost(p["title"], p["guid"], p["pubDate"])

If you compare it with appsample discussed in Part 1, you should notice that the revision is much shorter. It’s important to note, however, that nothing has changed from the user standpoint. So, if you now start the execPr function in your Python session:

>>>import appsample
>>>appsample.execPr()

This should insert a new record into the posts table, inserting automatically corresponding tags records into the posttags table, if any. The difference lies in the way it’s going on behind the scenes. Now the Python code is responsible only for obtaining the latest post from the Packt Book Feed page and then inserting a record into the posts table. Dealing with tags is now responsibility of the logic implemented inside the database. In particular, the AFTER INSERT trigger defined on the posts table should take care of inserting the rows into the posttags table.

To make sure that everything has worked smoothly, you can now check out the content of the posts and posttags tables. To look at the latest post stored in the posts table, you could issue the following query:

SELECT title, str_to_date(pubDate,'%a, %e %b %Y') lastdate FROM posts ORDER BY lastdate DESC LIMIT 1;

Then, you might want to look at the related tags stored in the posttags tables, by issuing the following query:

SELECT p.title, t.tag, str_to_date(p.pubDate,'%a, %e %b %Y') lastdate FROM posts p, posttags t
WHERE p.title=t.title ORDER BY lastdate DESC LIMIT 1;

Conclusion

In this article, you looked at how some business logic of a Python/MySQL application can be moved from Python into MySQL. For that, you continued with the sample application originally discussed in Python Data Persistence using MySQL.

Expert Python Programming Best practices for designing, coding, and distributing your Python software
Published: September 2008
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

About the Author :


Yuli Vasiliev

Yuli Vasiliev is a software developer, freelance author, and consultant currently specializing in open-source development, Oracle technologies, and service-oriented architecture (SOA). He has over 10 years of software development experience as well as several years of technical writing experience. He wrote a series of technical articles for Oracle Technology Network (OTN) and Oracle Magazine.

Contact Yuli Vasiliev

Books From Packt

Expert Python Programming
Expert Python Programming

CherryPy Essentials: Rapid Python Web Application Development
CherryPy Essentials: Rapid Python Web Application Development

Professional Plone Development
Professional Plone Development

Learning Website Development with Django
Learning Website Development with Django

OpenCms 7 Development
OpenCms 7 Development

Zenoss Core Network and System Monitoring
Zenoss Core Network and System Monitoring

Building Websites with Joomla! 1.5
Building Websites with Joomla! 1.5

Building Powerful and Robust Websites with Drupal 6
Building Powerful and Robust Websites with Drupal 6

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software