Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hands-On Software Engineering with Golang

You're reading from  Hands-On Software Engineering with Golang

Product type Book
Published in Jan 2020
Publisher Packt
ISBN-13 9781838554491
Pages 640 pages
Edition 1st Edition
Languages
Author (1):
Achilleas Anagnostopoulos Achilleas Anagnostopoulos
Profile icon Achilleas Anagnostopoulos

Table of Contents (21) Chapters

Preface 1. Section 1: Software Engineering and the Software Development Life Cycle
2. A Bird's-Eye View of Software Engineering 3. Section 2: Best Practices for Maintainable and Testable Go Code
4. Best Practices for Writing Clean and Maintainable Go Code 5. Dependency Management 6. The Art of Testing 7. Section 3: Designing and Building a Multi-Tier System from Scratch
8. The Links 'R'; Us Project 9. Building a Persistence Layer 10. Data-Processing Pipelines 11. Graph-Based Data Processing 12. Communicating with the Outside World 13. Building, Packaging, and Deploying Software 14. Section 4: Scaling Out to Handle a Growing Number of Users
15. Splitting Monoliths into Microservices 16. Building Distributed Graph-Processing Systems 17. Metrics Collection and Visualization 18. Epilogue
19. Assessments 20. Other Books You May Enjoy

Data-Processing Pipelines

"Inside every well-written large program is a well-written small program."
- Tony Hoare

Pipelines are a fairly standard and used way to segregate the processing of data into multiple stages. In this chapter, we will be exploring the basic principles behind data-processing pipelines and present a blueprint for implementing generic, concurrent-safe, and reusable pipelines using Go primitives, such as channels, contexts, and go-routines.

In this chapter, you will learn about the following:

  • Designing a generic processing pipeline from scratch using Go primitives
  • Approaches to modeling pipeline payloads in a generic way
  • Strategies for dealing with errors that can occur while a pipeline is executing
  • Pros and cons of synchronous and asynchronous pipeline design
  • Applying pipeline design concepts to building the Links 'R' Us crawler component...

Technical requirements

The full code for the topics discussed in this chapter has been published to this book's GitHub repository under the Chapter07 folder.

You can access the GitHub repository that contains the code and all required resources for each of this book's chapters by going to https://github.com/PacktPublishing/Hands-On-Software-Engineering-with-Golang.

To get you up and running as quickly as possible, each example project includes a makefile that defines the following set of targets:

Makefile target Description
deps Install any required dependencies
test Run all tests and report coverage
lint Check for lint errors

As with all other book chapters, you will need a fairly recent version of Go, which you can download at https://golang.org/dl.

Building a generic data-processing pipeline in Go

The following figure illustrates the high-level design of the pipeline that we will be building throughout the first half of this chapter:

Figure 1: A generic, multistage pipeline

Keep in mind that this is definitely not the only, or necessarily the best, way to go about implementing a data-processing pipeline. Pipelines are inherently application specific, so there is not really a one-size-fits-all guide for constructing efficient pipelines.

Having said that, the proposed design is applicable to a wide variety of use cases, including, but not limited to, the crawler component for the Links 'R' Us project. Let's examine the preceding figure in a bit more detail and identify the basic components that the pipeline comprises:

  • The input source: Inputs essentially function as data-sources that pump data into the pipeline...

Building a crawler pipeline for the Links 'R' Us project

In the following sections, we will be putting the generic pipeline package that we built to the test by using it to construct the crawler pipeline for the Links 'R' Us project!

Following the single-responsibility principle, we will break down the crawl task into a sequence of smaller subtasks and assemble the pipeline illustrated in the following figure. The decomposition into smaller subtasks also comes with the benefit that each stage processor can be tested in total isolation without the need to create a pipeline instance:

Figure 2: The stages of the crawler pipeline that we will be constructing

The full code for the crawler and its tests can be found in the Chapter07/crawler package, which you can find at the book's GitHub repository.

...

Summary

In this chapter, we built from scratch our very own generic, extensible pipeline package using nothing more than the basic Go primitives. We have analyzed and implemented different strategies (FIFO, fixed/dynamic worker pools, and broadcasting) for processing data throughout the various stages of our pipeline. In the last part of the chapter, we applied everything that we have learned so far to implement a multistage crawler pipeline for the Links 'R' Us Project.

In summary, pipelines provide an elegant solution for breaking down complex data processing tasks into smaller and easier-to-test steps that can be executed in parallel to make better use of the compute resources available at your disposal. In the next chapter, we are going to take a look at a different paradigm for processing data that is organized as a graph.

...

Questions

  1. Why is it considered an antipattern to use interface{} values as arguments to functions and methods?
  2. You are trying to design and build a complex data-processing pipeline that requires copious amounts of computing power (for example, face recognition, audio transcription, or similar). However, when you try to run it on your local machine, you realize that the resource requirements for some of the stages exceed the ones that are currently available locally. Describe how you could modify your current pipeline setup so that you could still run the pipeline on your machine, but arrange for some parts of the pipeline to execute on a remote server that you control.
  3. Describe how you would apply the decorator pattern to log errors returned by the processor functions that you have attached to a pipeline.
  4. What are the key differences between a synchronous and an asynchronous...

Further reading

  1. Berners-Lee, T. ; Fielding, R. ; Masinter, L., RFC 3986, Uniform Resource Identifier (URI): Generic Syntax.
  2. bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS: https://github.com/microcosm-cc/bluemonday
  1. Documentation for the Go pprof package: https://golang.org/pkg/runtime/pprof
  2. Documentation for the Pool type in the sync package: https://golang.org/pkg/sync/#Pool
  3. gomock: a mocking framework for the Go programming language: https://github.com/golang/mock
  4. go-multierror: a Go (golang) package for representing a list of errors as a single error: https://github.com/hashicorp/go-multierror
  5. Moskowitz, Robert ; Karrenberg, Daniel ; Rekhter, Yakov ; Lear, Eliot ; Groot, Geert Jan de: Address Allocation for Private Internets.
  6. The Go blog: profiling Go programs: https://blog.golang.org/profiling...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Hands-On Software Engineering with Golang
Published in: Jan 2020 Publisher: Packt ISBN-13: 9781838554491
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}