Getting Started with Beautiful Soup


Getting Started with Beautiful Soup
eBook: $20.99
Formats: PDF, PacktLib, ePub and Mobi formats
$17.84
save 15%!
Print + free eBook + free PacktLib access to the book: $55.98    Print cover: $34.99
$34.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • Learn about the features of Beautiful Soup with Python
  • Understand how to use a simple method to extract information from websites using Beautiful Soup and the Python urllib2 module
  • Master searching, navigation, content modification, encoding, and output methods quickly and efficiently
  • Try out the example code and get to grips with Beautiful Soup easily

Book Details

Language : English
Paperback : 130 pages [ 235mm x 191mm ]
Release Date : January 2014
ISBN : 1783289554
ISBN 13 : 9781783289554
Author(s) : Vineeth G. Nair
Topics and Technologies : All Books, Web Development, Open Source


Table of Contents

Preface
Chapter 1: Installing Beautiful Soup
Chapter 2: Creating a BeautifulSoup Object
Chapter 3: Search Using Beautiful Soup
Chapter 4: Navigation Using Beautiful Soup
Chapter 5: Modifying Content Using Beautiful Soup
Chapter 6: Encoding Support in Beautiful Soup
Chapter 7: Output in Beautiful Soup
Chapter 8: Creating a Web Scraper
Index
  • Chapter 1: Installing Beautiful Soup
    • Installing Beautiful Soup
      • Installing Beautiful Soup in Linux
        • Installing Beautiful Soup using package manager
        • Installing Beautiful Soup using pip or easy_install
        • Installing Beautiful Soup using pip
        • Installing Beautiful Soup using easy_install
      • Installing Beautiful Soup in Windows
        • Verifying Python path in Windows
      • Installing Beautiful Soup using setup.py
    • Using Beautiful Soup without installation
    • Verifying the installation
    • Quick reference
    • Summary
  • Chapter 2: Creating a BeautifulSoup Object
    • Creating a BeautifulSoup object
      • Creating a BeautifulSoup object from a string
      • Creating a BeautifulSoup object from a file-like object
      • Creating a BeautifulSoup object for XML parsing
        • Understanding the features argument
    • Tag
      • Accessing the Tag object from BeautifulSoup
      • Name of the Tag object
      • Attributes of a Tag object
    • The NavigableString object
    • Quick reference
    • Summary
  • Chapter 3: Search Using Beautiful Soup
    • Searching in Beautiful Soup
      • Searching with find()
        • Finding the first producer
        • Explaining find()
      • Searching with find_all()
        • Finding all tertiary consumers
        • Understanding parameters used with find_all()
      • Searching for Tags in relation
        • Searching for the parent tags
        • Searching for siblings
        • Searching for next
        • Searching for previous
    • Using search methods to scrape information from a web page
    • Quick reference
    • Summary
  • Chapter 4: Navigation Using Beautiful Soup
    • Navigation using Beautiful Soup
      • Navigating down
        • Using the name of the child tag
        • Using predefined attributes
        • Special attributes for navigating down
      • Navigating up
        • The .parent attribute
        • The .parents attribute
      • Navigating sideways to the siblings
        • The .next_sibling attribute
        • The .previous_sibling attribute
      • Navigating to the previous and next objects parsed
    • Quick reference
    • Summary
  • Chapter 5: Modifying Content Using Beautiful Soup
    • Modifying Tag using Beautiful Soup
      • Modifying the name property of Tag
      • Modifying the attribute values of Tag
        • Updating the existing attribute value of Tag
        • Adding new attribute values to Tag
      • Deleting the tag attributes
      • Adding a new tag
    • Modifying string contents
      • Using .string to modify the string content
      • Adding strings using .append(), insert(), and new_string()
    • Deleting tags from the HTML document
      • Deleting the producer using decompose()
      • Deleting the producer using extract()
      • Deleting the contents of a tag using Beautiful Soup
    • Special functions to modify content
    • Quick reference
    • Summary
  • Chapter 7: Output in Beautiful Soup
    • Formatted printing
    • Unformatted printing
    • Output formatters in Beautiful Soup
      • The minimal formatter
      • The html formatter
      • The None formatter
      • The function formatter
    • Using get_text()
    • Quick reference
    • Summary
  • Chapter 8: Creating a Web Scraper
    • Getting book details from PacktPub.com
      • Finding pages with a list of books
      • Finding book details
    • Getting selling prices from Amazon
    • Getting the selling price from Barnes and Noble
    • Summary

Vineeth G. Nair

Vineeth G. Nair completed his bachelors in Computer Science and Engineering from Model Engineering College, Cochin, Kerala. He is currently working with Oracle India Pvt. Ltd. as a Senior Applications Engineer.

He developed an interest in Python during his college days and began working as a freelance programmer. This led him to work on several web scraping projects using Beautiful Soup. It helped him gain a fair level of mastery on the technology and a good reputation in the freelance arena. He can be reached at vineethgnair.mec@gmail.com. You can visit his website at www.kochi-coders.com.

Sorry, we don't have any reviews for this title yet.

Code Downloads

Download the code and support files for this book.


Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.


Errata

- 1 submitted: last submission 12 Mar 2014

Errata Type: Code | Page number: 37

Under the heading Finding all tertiary consumers, the code line all_tertiaryconsumers = soup.find_all(class_="tertiaryconsumerslist") should be  all_tertiaryconsumers = soup.find_all(class_="tertiaryconsumerlist").


Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

Frequently bought together

Getting Started with Beautiful Soup +    Mastering Object-oriented Python =
50% Off
the second eBook
Price for both: £21.45

Buy both these recommended eBooks together and get 50% off the cheapest eBook.

What you will learn from this book

  • Learn how to scrape HTML pages from websites
  • Implement a simple method to scrape any website with the help of developer tools, the Python urllib2 module, and Beautiful Soup
  • Learn how to search for information within an HTML/XML page
  • Modify the contents of an HTML tree
  • Understand encoding support in Beautiful Soup
  • Learn about the different types of output formatting

In Detail

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need without writing excess code for an application. It doesn't take much code to write an application using Beautiful Soup.

Getting Started with Beautiful Soup is a practical guide to Beautiful Soup using Python. The book starts by walking you through the installation of each and every feature of Beautiful Soup using simple examples which include sample Python codes as well as diagrams and screenshots wherever required for better understanding. The book discusses the problems of how exactly you can get data out of a website and provides an easy solution with the help of a real website and sample code.

Getting Started with Beautiful Soup goes over the different methods to install Beautiful Soup in both Linux and Windows systems. You will then learn about searching, navigating, content modification, encoding support, and output formatting with the help of examples and sample Python codes for each example so that you can try them out to get a better understanding. This book is a practical guide for scraping information from any website. If you want to learn how to efficiently scrape pages from websites, then this book is for you.

Approach

This book is a practical, hands-on guide that takes you through the techniques of web scraping using Beautiful Soup.

Who this book is for

Getting Started with Beautiful Soup is great for anybody who is interested in website scraping and extracting information. However, a basic knowledge of Python, HTML tags, and CSS is required for better understanding.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software