You're reading from Hands-On Software Engineering with Golang

Product typeBook

Published inJan 2020

Reading LevelIntermediate

PublisherPackt

ISBN-139781838554491

Edition1st Edition

Languages

Concepts

Programming Language

Author (1)

Achilleas Anagnostopoulos

Building Distributed Graph-Processing Systems

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."

- Leslie Lamport

The master/worker pattern is a popular approach for building fault-tolerant, distributed systems. The first part of this chapter explores this pattern in depth with a focus on some of the more challenging aspects of distributed systems, such as node discovery and error handling.

In the second part of this chapter, we will apply the master/worker pattern to build, from scratch, a distributed graph-processing system that can handle massive graphs whose size exceeds the memory capacity of most modern compute nodes. Finally, in the last part of this chapter, we will apply everything learned so far to create a distributed version of the PageRank calculator service for the Links...

Technical requirements

The full code for all topics discussed within this chapter has been published to this book's GitHub repository in the Chapter12 folder.

You can access the GitHub repository that contains the code and all required resources for each one of this book's chapters by pointing your web browser at the following URL: https://github.com/PacktPublishing/Hands-On-Software-Engineering-with-Golang.

Each example project for this chapter includes a common Makefile that defines the following set of targets:

Makefile target	Description
`deps`	Install any required dependencies.
`test`	Run all tests and report coverage.
`lint`	Check for lint errors.

As with all other chapters from this book, you will need a fairly recent version of Go, which you can download at https://golang.org/dl/.

To run some of the code in this chapter, you will need to have a working...

Introducing the master/worker model

The master/worker model is a commonly used pattern for building distributed systems that have been around for practically forever. When building a cluster using this model, nodes can be classified into two distinct groups, namely, masters and workers.

The key responsibility of worker nodes is to perform compute-intensive tasks such as the following:

Video transcoding
Training large-scale neural networks with millions of parameters
Calculating Online Analytical Processing (OLAP) queries
Running a Continuous Integration (CI) pipeline
Executing map-reduce operations on massive datasets

On the other hand, master nodes are typically assigned the role of the coordinator. To this end, they are responsible for the following:

Discovering and keeping track of available worker nodes
Breaking down jobs into smaller tasks and distributing them to each...

Out-of-core distributed graph processing

Back in Chapter 8, Graph-Based Data Processing, we designed and built our very own system for implementing graph-based algorithms based on the Bulk Synchronous Parallel (BSP) model. Admittedly, our final implementation was heavily influenced by the ideas from the Google paper describing Pregel ^[4], a system that was originally built by Google engineers to tackle graph-based computation at scale.

While the bspgraph package from Chapter 8, Graph-Based Data Processing, can automatically distribute the graph computation load among a pool of workers, it is still limited to running on a single compute node. As our Links 'R' Us crawler augments our link index with more and more links, we will eventually reach a point where the PageRank computation will simply take too long. Updating the PageRank scores for the entire graphs might take...

Deploying a distributed version of the Links 'R' Us PageRank calculator

The PageRank calculator is the only component of the Links 'R' Us project that we haven't yet been able to horizontally scale on Kubernetes. Back in Chapter 8, Graph-Based Data Processing, where we used the bspgraph package to implement the PageRank algorithm, I promised you that a few chapters down the road, we would take the PageRank calculator code, and without any code modifications, enable it to run in distributed mode.

After completing this chapter, I strongly recommend, as a fun learning exercise, taking a look at using the dbspgraph package to build a distributed version of either the graph coloring or the shortest path algorithms from Chapter 8, Graph-Based Data Processing.

In this section, we will leverage all of the work we have done so far in this chapter to achieve this...

Summary

In this rather long chapter, we performed a deep dive into all of the aspects involved in the creation of a distributed graph-processing system that allows us to take any graph-based algorithm created with the bspgraph package from Chapter 8, Graph-Based Data Processing, and automatically distribute it to a cluster of worker nodes.

What's more, as a practical application of what we learned in this chapter, we modified the Links 'R' Us PageRank calculator service from the previous chapter so that it can now run in distributed mode. By doing so, we achieved the primary goal for this book—to build and deploy a complex Go project where every component can be independently scaled horizontally.

The next and final chapter focuses on the reliability aspects of the system we just built. We will be exploring approaches for collecting, aggregating, and visualizing...

Questions

Describe the differences between a leader-follower and a multi-master cluster configuration.
Explain how the checkpoint strategy can be used to recover from errors.
What is the purpose of the distributed barrier in the out-of-core graph processing system that we built in this chapter?
Assume that we are provided with a graph-based algorithm that we want to run in a distributed fashion. Would you consider a computation job as completed once the algorithm terminates?

Consul: Secure service networking. https://consul.io
Docker: Enterprise container platform. https://www.docker.com
Lamport, Leslie: Paxos Made Simple. In ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) (2001), S. 51–58
Malewicz, Grzegorz; Austern, Matthew H.; Bik, Aart J. C; Dehnert, James C.; Horn, Ilan; Leiser, Naty; Czajkowski, Grzegorz: Pregel: A System for Large-scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10. New York, NY, USA : ACM, 2010 — ISBN 978-1-4503-0032-2, S. 135–146
Ongaro, Diego; Ousterhout, John: In Search of an Understandable Consensus Algorithm. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14. Berkeley, CA, USA : USENIX Association, 2014 — ISBN...

The rest of the chapter is locked

You have been reading a chapter from

Hands-On Software Engineering with Golang

Published in: Jan 2020Publisher: PacktISBN-13: 9781838554491

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Achilleas Anagnostopoulos

Achilleas Anagnostopoulos has been writing code in a multitude of programming languages since the mid 90s. His main interest lies in building scalable, microservice-based distributed systems where components are interconnected via gRPC or message queues. Achilleas has over 4 years of experience building production-grade systems using Go and occasionally enjoys pushing the language to its limits through his experimental gopher-os project: a 64-bit kernel written entirely in Go. He is currently a member of the Juju team at Canonical, contributing to one of the largest open source Go code bases in existence.
Read more about Achilleas Anagnostopoulos

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5