Working with Big Data in Python [Video]

Preview in Mapt
Code Files

Working with Big Data in Python [Video]

Alex Rutherford

Gain valuable insights from your data by streamlining unstructured data pipelines with Python, Spark, and MongoDB

Quick links: > What will you learn?> Table of content

This title is available to pre-order now and is expected to be published in
Video
$106.25
RRP $124.99
Save 14%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$106.25
RRP $124.99

Frequently bought together


Working with Big Data in Python [Video] Book Cover
Working with Big Data in Python [Video]
$ 124.99
$ 106.25
Working with Data and Cloud in Spring 5.0 [Video] Book Cover
Working with Data and Cloud in Spring 5.0 [Video]
$ 124.99
$ 106.25
Buy 2 for $35.00
Save $214.98
Add to Cart

Video Details

ISBN 139781788839068
Course Length2 hours

Video Description

This course is a comprehensive, practical guide to using MongoDB and Spark in Python, learning how to store and make sense of huge data sets, and performing basic machine learning tasks to make predictions.

MongoDB is one of the most powerful non-relational database systems available offering robust scalability and expressive operations that, when combined with Python data analysis libraries and distributed computing, represent a valuable set of tools for the modern data scientist.

NoSQL databases require a new way of thinking about data and scalable queries. Once Mongo queries have been mastered, it is necessary to understand how we can leverage this API in Python's rich analysis and visualization ecosystem.

This course will cover how to use MongoDB, particularly if you are used to SQL databases, with a focus on scalability to large datasets. pyMongo is introduced as the means to interact with a MongoDB database from within Python code and the data structures used to do so are explored. MongoDB uniquely allows for complex operations and aggregations to be run within the query itself and we will cover how to use these operators. While MongoDB itself is built for easy scalability across many nodes as datasets grow, Python is not. Therefore, we cover how we can use Spark with MongoDB to handle more complex machine learning techniques for extremely large datasets. This learning will be applied through several examples of real-world datasets and analyses that can form the basis of your own pipelines, allowing you to quickly get up-and running with a powerful data science toolkit. This course supplies in-depth content that puts the theory into practice.

Style and Approach

An exhaustive course that carefully covers the fundamental concepts of unstructured data and distributed programming before applying them to examples of typical data science workflows.

This course is divided into clear chunks, so you can learn at your own pace and focus on your own area of interest.

Table of Contents

What You Will Learn

  • MongoDB as a non-relational database based on JSON documents
  • Set up cursors in pyMongo as a connector to a MongoDB database
  • Run more complex chaining and aggregation queries
  • Connect to MongoDB in pySpark
  • How to write MongoDB queries using operators and chain these together into aggregation pipelines
  • Real-world examples of using Python and MongoDB in a data pipeline
  • Using Mongo connectors in pySpark for high-performance processing

Authors

Table of Contents

Video Details

ISBN 139781788839068
Course Length2 hours
Read More

Read More Reviews

Recommended for You

Working with Data and Cloud in Spring 5.0 [Video] Book Cover
Working with Data and Cloud in Spring 5.0 [Video]
$ 124.99
$ 106.25
Natural Language Processing with Python [Video] Book Cover
Natural Language Processing with Python [Video]
$ 124.99
$ 106.25
Iterators in Functional Programming with Python [Video] Book Cover
Iterators in Functional Programming with Python [Video]
$ 124.99
$ 106.25
From 0 to 1: Hive for Processing Big Data [Video] Book Cover
From 0 to 1: Hive for Processing Big Data [Video]
$ 49.99
$ 42.50
Data-Driven Testing in Selenium [Video] Book Cover
Data-Driven Testing in Selenium [Video]
$ 124.99
$ 106.25
Python Data Visualization with Matplotlib 2.x [Video] Book Cover
Python Data Visualization with Matplotlib 2.x [Video]
$ 124.99
$ 106.25