Scalable Data Analysis in Python with Dask [Video]

4 (1 reviews total)
By Mohammed kashif
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this video

Data analysts, Machine Learning professionals, and data scientists often use tools such as pandas, scikit-Learn, and NumPy for data analysis on their personal computer. However, when they want to apply their analyses to larger datasets, these tools fail to scale beyond a single machine, and so the analyst is forced to rewrite their computation.

If you work on big data and you’re using pandas, you know you can end up waiting up to a whole minute for a simple average of a series. And that’s just for a couple of million rows!

In this course, you’ll learn to scale your data analysis. Firstly, you will execute distributed data science projects right from data ingestion to data manipulation and visualization using Dask. Then, you will explore the Dask framework. After, see how Dask can be used with other common Python tools such as NumPy, pandas, Matplotlib, scikit-learn, and more.

You’ll be working on large datasets and performing exploratory data analysis to investigate the dataset, then come up with the findings from the dataset. You’ll learn by implementing data analysis principles using different statistical techniques in one go across different systems on the same massive datasets.

Throughout the course, we’ll go over the various techniques, modules, and features that Dask has to offer. Finally, you’ll learn to use its unique offering for Machine Learning, using the Dask-ML package. You’ll also start using parallel processing in your data tasks on your own system without moving to the distributed environment.

All the code files and related files are uploaded on GitHub at this link:

Publication date:
May 2019
4 hours 0 minutes

About the Author

  • Mohammed kashif

    Mohammed Kashif works as a data scientist at Nineleaps, India, dealing mostly with graph data analysis. Prior to this, he worked as a Python developer at Qualcomm. He completed his Master's degree in computer science at IIIT Delhi, with a specialization in data engineering. His areas of interest include recommender systems, NLP, and graph analytics. In his spare time, he likes to solve questions on StackOverflow and help debug other people out of their misery. He is also an experienced teaching assistant with a demonstrated history of working in the higher-education industry.

    Browse publications by this author

Latest Reviews

(1 reviews total)
Good content. Some parts could be more specific to what makes Dask special e.g. in comparison to Pandas. I felt that sometimes there are too much details about the hands on examples.

Recommended For You

Python Machine Learning - Third Edition

Applied machine learning with a solid foundation in theory. Revised and expanded for TensorFlow 2, GANs, and reinforcement learning.

By Sebastian Raschka and 1 more
Deep Learning with TensorFlow 2 and Keras - Second Edition

Build machine and deep learning systems with the newly released TensorFlow 2 and Keras for the lab, production, and mobile devices

By Antonio Gulli and 2 more
Learning Geospatial Analysis with Python - Third Edition

Learn the core concepts of geospatial data analysis for building actionable and insightful GIS applications

By Joel Lawhead
Advanced Deep Learning with Python

Gain expertise in advanced deep learning domains such as neural networks, meta-learning, graph neural networks, and memory augmented neural networks using the Python ecosystem

By Ivan Vasilev