About this book
Pachyderm is an open-source project that enables data scientists to run reproducible data pipelines and scale them to an enterprise level.
This book will teach you how to implement Pachyderm to create collaborative data science workflows and reproduce your ML experiments at scale. You’ll begin by understanding the importance of data reproducibility and comparing different data science platforms. Next, you’ll explore how Pachyderm fits into the picture and its significance, followed by understanding instructions for installing Pachyderm locally on your computer or on a cloud platform of your choice. You’ll then discover the architectural components and main pipeline principles and concepts of Pachyderm. The book will demonstrate how to create your first data pipeline using Pachyderm and its components. You’ll further explore common operations with data in Pachyderm, such as uploading data to and from Pachyderm to create complex pipelines. Based on what you have learned, you will then develop an end-to-end ML workflow, before understanding the hyperparameter tuning technique and the different supported Pachyderm language clients along with examples. Finally, you’ll integrate Pachyderm with JupyterHub and Kubeflow.
By the end of this book, you’ll be able to create repositories and pipelines, and implement Pachyderm in your infrastructure.
- Publication date:
- January 2022