Working with Big Data in Python [Video]

Preview in Mapt

Working with Big Data in Python [Video]

Alexis Rutherford

Gain valuable insights from your data by streamlining unstructured data pipelines with Python, Spark, and MongoDB
Mapt Subscription
FREE
$29.99/m after trial
Video
$106.25
RRP $124.99
Save 14%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$106.25
$29.99 p/m after trial
RRP $124.99
Subscription
Video
Start 14 Day Trial

Frequently bought together


Working with Big Data in Python [Video] Book Cover
Working with Big Data in Python [Video]
$ 124.99
$ 106.25
Apache Spark with Python - Big Data with PySpark and Spark [Video] Book Cover
Apache Spark with Python - Big Data with PySpark and Spark [Video]
$ 149.99
$ 127.50
Buy 2 for $35.01
Save $239.97
Add to Cart

Video Details

ISBN 139781788839068
Course Length2 hours 41 minutes

Video Description

This course is a comprehensive, practical guide to using MongoDB and Spark in Python, learning how to store and make sense of huge data sets, and performing basic machine learning tasks to make predictions.

MongoDB is one of the most powerful non-relational database systems available offering robust scalability and expressive operations that, when combined with Python data analysis libraries and distributed computing, represent a valuable set of tools for the modern data scientist. NoSQL databases require a new way of thinking about data and scalable queries. Once Mongo queries have been mastered, it is necessary to understand how we can leverage this API in Python's rich analysis and visualization ecosystem. This course will cover how to use MongoDB, particularly if you are used to SQL databases, with a focus on scalability to large datasets. pyMongo is introduced as the means to interact with a MongoDB database from within Python code and the data structures used to do so are explored. MongoDB uniquely allows for complex operations and aggregations to be run within the query itself and we will cover how to use these operators. While MongoDB itself is built for easy scalability across many nodes as datasets grow, Python is not. Therefore, we cover how we can use Spark with MongoDB to handle more complex machine learning techniques for extremely large datasets. This learning will be applied to several examples of real-world datasets and analyses that can form the basis of your own pipelines, allowing you to quickly get up-and-running with a powerful data science toolkit.

Style and Approach

An exhaustive course that carefully covers the fundamental concepts of unstructured data and distributed programming before applying them to examples of typical data science workflows.

This course is divided into clear chunks, so you can learn at your own pace and focus on your own area of interest.

Table of Contents

Working with MongoDB
The Course Overview
What Is MongoDB and Why Should I Use It?
From Tabular Data to JSON Documents
MongoDB Indices and Datatypes
Setting Up MongoDB and Running Our First MongoDB Query
Using the pyMongo Module
Setting Up pyMongo
Using pyMongo Cursors
Inserting and Finding Documents
Return Codes and Exceptions
Using Operators, Updates, and Aggregations
Example 1: Loading and Querying Weather Data
Grabbing Weather Data via OpenWeather API
Ingesting Weather Data into MongoDB
Querying Weather Data from MongoDB
Working with pySpark and MongoDB
What Is Spark and When Do We Need It?
Data Structures in Spark
Data Structures in Spark (Continued)
Connecting to MongoDB with pySpark
Example 2: Querying Reddit Comment Data with MongoDB and PySpark
Making Reddit Data Available to PySpark
Loading Data from MongoDB in Spark, Transform into Pandas DF
Preparing Data for Prediction Task Using spark.ml
Predicting Up Votes Using pyspark.ml

What You Will Learn

  • MongoDB as a non-relational database based on JSON documents
  • Set up cursors in pyMongo as a connector to a MongoDB database
  • Run more complex chaining and aggregation queries
  • Connect to MongoDB in pySpark
  • How to write MongoDB queries using operators and chain these together into aggregation pipelines
  • Real-world examples of using Python and MongoDB in a data pipeline
  • Using Mongo connectors in pySpark for high-performance processing

Authors

Table of Contents

Working with MongoDB
The Course Overview
What Is MongoDB and Why Should I Use It?
From Tabular Data to JSON Documents
MongoDB Indices and Datatypes
Setting Up MongoDB and Running Our First MongoDB Query
Using the pyMongo Module
Setting Up pyMongo
Using pyMongo Cursors
Inserting and Finding Documents
Return Codes and Exceptions
Using Operators, Updates, and Aggregations
Example 1: Loading and Querying Weather Data
Grabbing Weather Data via OpenWeather API
Ingesting Weather Data into MongoDB
Querying Weather Data from MongoDB
Working with pySpark and MongoDB
What Is Spark and When Do We Need It?
Data Structures in Spark
Data Structures in Spark (Continued)
Connecting to MongoDB with pySpark
Example 2: Querying Reddit Comment Data with MongoDB and PySpark
Making Reddit Data Available to PySpark
Loading Data from MongoDB in Spark, Transform into Pandas DF
Preparing Data for Prediction Task Using spark.ml
Predicting Up Votes Using pyspark.ml

Video Details

ISBN 139781788839068
Course Length2 hours 41 minutes
Read More

Read More Reviews

Recommended for You

Apache Spark with Python - Big Data with PySpark and Spark [Video] Book Cover
Apache Spark with Python - Big Data with PySpark and Spark [Video]
$ 149.99
$ 127.50
Data Visualization Recipes in Python [Video] Book Cover
Data Visualization Recipes in Python [Video]
$ 124.99
$ 106.25
Data Visualization Projects in Python [Video] Book Cover
Data Visualization Projects in Python [Video]
$ 124.99
$ 106.25
Apache Spark with Scala - Learn Spark from a Big Data Guru [Video] Book Cover
Apache Spark with Scala - Learn Spark from a Big Data Guru [Video]
$ 149.99
$ 127.50
Hands-On Test Driven Development with Python [Video] Book Cover
Hands-On Test Driven Development with Python [Video]
$ 124.99
$ 106.25
Apache Spark with Java - Learn Spark from a Big Data Guru [Video] Book Cover
Apache Spark with Java - Learn Spark from a Big Data Guru [Video]
$ 197.99
$ 168.30