Reader small image

You're reading from  Learning Cython Programming (Second Edition) - Second Edition

Product typeBook
Published inFeb 2016
Reading LevelBeginner
PublisherPackt
ISBN-139781783551675
Edition2nd Edition
Languages
Tools
Right arrow
Author (1)
Philip Herron
Philip Herron
author image
Philip Herron

Philip Herron is a developer who focuses his passion toward compilers and virtual machine implementations. When he was first accepted to Google Summer of Code 2010, he used inspiration from Paul Biggar's PhD on the optimization of dynamic languages to develop a proof of the concept GCC frontend to compile Python. This project sparked his deep interest in how Python works. After completing a consecutive year on the same project in 2011, Philip applied to Cython under the Python foundation to gain a deeper appreciation of the standard Python implementation. Through this he started leveraging the advantages of Python to control the logic in systems or even add more high-level interfaces, such as embedding Flask web servers in a REST API to a system-level piece of software, without writing any C code. Philip currently works as a software consultant for Instil Software based in Northern Ireland. He develops mobile applications with embedded native code for video streaming. Instil has given him a lot of support in becoming a better engineer. He has written several tutorials for the UK-based Linux Format magazine on Python and loves to share his passion for the Python programming language.
Read more about Philip Herron

Right arrow

Parsing large amounts of data


I want to try and prove how powerful and natively compiled C types are to programmers by showing the difference in parsing large amounts of XML. We can take the geographic data from the government as the test data for this experiment (http://www.epa.gov/enviro/geospatial-data-download-service).

Let's look at the size of this XML data:

 ls -liah
total 480184
7849156 drwxr-xr-x   5 redbrain  staff   170B 25 Jul 16:42 ./
5803438 drwxr-xr-x  11 redbrain  staff   374B 25 Jul 16:41 ../
7849208 -rw-r--r--@  1 redbrain  staff   222M  9 Mar 04:27 EPAXMLDownload.xml
7849030 -rw-r--r--@  1 redbrain  staff    12M 25 Jul 16:38 EPAXMLDownload.zip
7849174 -rw-r--r--   1 redbrain  staff    57B 25 Jul 16:42 README

It's huge! Before we write programs, we need to understand a little bit about the structure of this data to see what we want to do with it. It contains facility site locations with addresses. This seems to be the bulk of the data in here, so let's try and parse it all...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learning Cython Programming (Second Edition) - Second Edition
Published in: Feb 2016Publisher: PacktISBN-13: 9781783551675

Author (1)

author image
Philip Herron

Philip Herron is a developer who focuses his passion toward compilers and virtual machine implementations. When he was first accepted to Google Summer of Code 2010, he used inspiration from Paul Biggar's PhD on the optimization of dynamic languages to develop a proof of the concept GCC frontend to compile Python. This project sparked his deep interest in how Python works. After completing a consecutive year on the same project in 2011, Philip applied to Cython under the Python foundation to gain a deeper appreciation of the standard Python implementation. Through this he started leveraging the advantages of Python to control the logic in systems or even add more high-level interfaces, such as embedding Flask web servers in a REST API to a system-level piece of software, without writing any C code. Philip currently works as a software consultant for Instil Software based in Northern Ireland. He develops mobile applications with embedded native code for video streaming. Instil has given him a lot of support in becoming a better engineer. He has written several tutorials for the UK-based Linux Format magazine on Python and loves to share his passion for the Python programming language.
Read more about Philip Herron