Working with Big Data in Python [Video]

Working with Big Data in Python [Video]

Alexis Rutherford

Gain valuable insights from your data by streamlining unstructured data pipelines with Python, Spark, and MongoDB
Mapt Subscription
FREE
$30.00/m after trial
Video
$18.75
RRP $124.99
Save 84%
What do I get with a Mapt subscription?
  • Unlimited access to all Packt’s 6,000+ eBooks and Videos
  • 100+ new titles a month, learning paths, assessments & code files
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the subscription reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the subscription reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the subscription reader
$0.00
$18.75
$29.99 p/m after trial
RRP $124.99
Subscription
Video
Start 14 Day Trial

Frequently bought together


Working with Big Data in Python [Video] Book Cover
Working with Big Data in Python [Video]
$ 124.99
$ 18.75
Apache Spark with Python - Big Data with PySpark and Spark [Video] Book Cover
Apache Spark with Python - Big Data with PySpark and Spark [Video]
$ 149.99
$ 22.50
Buy 2 for $35.01
Save $239.97
Add to Cart

Video Description

This course is a comprehensive, practical guide to using MongoDB and Spark in Python, learning how to store and make sense of huge data sets, and performing basic machine learning tasks to make predictions.

MongoDB is one of the most powerful non-relational database systems available offering robust scalability and expressive operations that, when combined with Python data analysis libraries and distributed computing, represent a valuable set of tools for the modern data scientist. NoSQL databases require a new way of thinking about data and scalable queries. Once Mongo queries have been mastered, it is necessary to understand how we can leverage this API in Python's rich analysis and visualization ecosystem. This course will cover how to use MongoDB, particularly if you are used to SQL databases, with a focus on scalability to large datasets. pyMongo is introduced as the means to interact with a MongoDB database from within Python code and the data structures used to do so are explored. MongoDB uniquely allows for complex operations and aggregations to be run within the query itself and we will cover how to use these operators. While MongoDB itself is built for easy scalability across many nodes as datasets grow, Python is not. Therefore, we cover how we can use Spark with MongoDB to handle more complex machine learning techniques for extremely large datasets. This learning will be applied to several examples of real-world datasets and analyses that can form the basis of your own pipelines, allowing you to quickly get up-and-running with a powerful data science toolkit.

Style and Approach

An exhaustive course that carefully covers the fundamental concepts of unstructured data and distributed programming before applying them to examples of typical data science workflows.

This course is divided into clear chunks, so you can learn at your own pace and focus on your own area of interest.

Video Preview

What You Will Learn

  • MongoDB as a non-relational database based on JSON documents
  • Set up cursors in pyMongo as a connector to a MongoDB database
  • Run more complex chaining and aggregation queries
  • Connect to MongoDB in pySpark
  • How to write MongoDB queries using operators and chain these together into aggregation pipelines
  • Real-world examples of using Python and MongoDB in a data pipeline
  • Using Mongo connectors in pySpark for high-performance processing

Authors

Video Details

ISBN 139781788839068
Course Length2 hours 41 minutes
Read More

Read More Reviews

Recommended for You

Apache Spark with Python - Big Data with PySpark and Spark [Video] Book Cover
Apache Spark with Python - Big Data with PySpark and Spark [Video]
$ 149.99
$ 22.50
Hands-On Data Analytics for Beginners with Google Colaboratory [Video] Book Cover
Hands-On Data Analytics for Beginners with Google Colaboratory [Video]
$ 124.99
$ 18.75
Hands-On Beginner’s Guide on Big Data and Hadoop 3 [Video] Book Cover
Hands-On Beginner’s Guide on Big Data and Hadoop 3 [Video]
$ 124.99
$ 18.75
Mastering Unsupervised Learning with Python [Video] Book Cover
Mastering Unsupervised Learning with Python [Video]
$ 124.99
$ 18.75
Data Visualization with Python: The Complete Guide [Video] Book Cover
Data Visualization with Python: The Complete Guide [Video]
$ 47.99
$ 7.20
Big Data Analytics Projects with Apache Spark [Video] Book Cover
Big Data Analytics Projects with Apache Spark [Video]
$ 124.99
$ 18.75