I want to try and prove how powerful and natively compiled C types are to programmers by showing the difference in parsing large amounts of XML. We can take the geographic data from the government as the test data for this experiment (http://www.epa.gov/enviro/geospatial-data-download-service).
Let's look at the size of this XML data:
ls -liah total 480184 7849156 drwxr-xr-x 5 redbrain staff 170B 25 Jul 16:42 ./ 5803438 drwxr-xr-x 11 redbrain staff 374B 25 Jul 16:41 ../ 7849208 -rw-r--r--@ 1 redbrain staff 222M 9 Mar 04:27 EPAXMLDownload.xml 7849030 -rw-r--r--@ 1 redbrain staff 12M 25 Jul 16:38 EPAXMLDownload.zip 7849174 -rw-r--r-- 1 redbrain staff 57B 25 Jul 16:42 README
It's huge! Before we write programs, we need to understand a little bit about the structure of this data to see what we want to do with it. It contains facility site locations with addresses. This seems to be the bulk of the data in here, so let's try and parse it all...