Reproducible Data Science with Pachyderm

By Svetlana Karslioglu
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Pachyderm is an open-source project that enables data scientists to run reproducible data pipelines and scale them to an enterprise level.

This book will teach you how to implement Pachyderm to create collaborative data science workflows and reproduce your ML experiments at scale. You’ll begin by understanding the importance of data reproducibility and comparing different data science platforms. Next, you’ll explore how Pachyderm fits into the picture and its significance, followed by understanding instructions for installing Pachyderm locally on your computer or on a cloud platform of your choice. You’ll then discover the architectural components and main pipeline principles and concepts of Pachyderm. The book will demonstrate how to create your first data pipeline using Pachyderm and its components. You’ll further explore common operations with data in Pachyderm, such as uploading data to and from Pachyderm to create complex pipelines. Based on what you have learned, you will then develop an end-to-end ML workflow, before understanding the hyperparameter tuning technique and the different supported Pachyderm language clients along with examples. Finally, you’ll integrate Pachyderm with JupyterHub and Kubeflow.

By the end of this book, you’ll be able to create repositories and pipelines, and implement Pachyderm in your infrastructure.

Publication date:
January 2022

About the Author

  • Svetlana Karslioglu

    Svetlana Karslioglu is a Documentation Engineer at Facebook. Prior to working with Facebook, she was a Senior Technical writer at Pachyderm with Pachyderm experience and she wrote a big chunk of Pachyderm documentation. She also provided a talk on Pachyderm called Smashing Bias with Dynamic Data Versioning.

    Browse publications by this author
Reproducible Data Science with Pachyderm
Unlock this book and the full library for $5 a month*
Start now