Chapter 5. Web Mining, Databases, and Big Data
On the menu for this chapter are the following recipes:
- Simulating web browsing
 - Scraping the Web
 - Dealing with non-ASCII text and HTML entities
 - Implementing association tables
 - Setting up database migration scripts
 - Adding a table column to an existing table
 - Adding indices after table creation
 - Setting up a test web server
 - Implementing a star schema with fact and dimension tables
 - Using HDFS
 - Setting up Spark
 - Clustering data with Spark
 
Introduction
This chapter is light on math, but it is more focused on technical topics. Technology has a lot to offer for data analysts. Databases have been around for a while, but the relational databases that most people are familiar with can be traced back to the 1970s. Edgar Codd came up with a number of ideas that later led to the creation of the relational model and SQL. Relational databases have been a dominant technology since then. In the 1980s, object-oriented programming languages caused a paradigm shift and...