Reader small image

You're reading from  Web Scraping with Python

Product typeBook
Published inOct 2015
Reading LevelIntermediate
PublisherPackt
ISBN-139781782164364
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Richard Penman
Richard Penman
author image
Richard Penman

Richard Lawson is from Australia and studied Computer Science at the University of Melbourne. Since graduating, he built a business specializing in web scraping while travelling the world, working remotely from over 50 countries. He is a fluent Esperanto speaker, conversational in Mandarin and Korean, and active in contributing to and translating open source software. He is currently undertaking postgraduate studies at Oxford University and in his spare time enjoys developing autonomous drones.
Read more about Richard Penman

Right arrow

Database cache


To avoid the anticipated limitations to our disk-based cache, we will now build our cache on top of an existing database system. When crawling, we may need to cache massive amounts of data and will not need any complex joins, so we will use a NoSQL database, which is easier to scale than a traditional relational database. Specifically, our cache will use MongoDB, which is currently the most popular NoSQL database.

What is NoSQL?

NoSQL stands for Not Only SQL and is a relatively new approach to database design. The traditional relational model used a fixed schema and splits the data into tables. However, with large datasets, the data is too big for a single server and needs to be scaled across multiple servers. This does not fit well with the relational model because, when querying multiple tables, the data will not necessarily be available on the same server. NoSQL databases, on the other hand, are generally schemaless and designed from the start to shard seamlessly across...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Web Scraping with Python
Published in: Oct 2015Publisher: PacktISBN-13: 9781782164364

Author (1)

author image
Richard Penman

Richard Lawson is from Australia and studied Computer Science at the University of Melbourne. Since graduating, he built a business specializing in web scraping while travelling the world, working remotely from over 50 countries. He is a fluent Esperanto speaker, conversational in Mandarin and Korean, and active in contributing to and translating open source software. He is currently undertaking postgraduate studies at Oxford University and in his spare time enjoys developing autonomous drones.
Read more about Richard Penman