Summary
In this chapter we’ve investigated an interesting business case that let us model a solution using a versatile and powerful data structure: graphs. Asynchronous data pipelines were implemented to extract, transform and load a dataset to prepare data for the actual analytic business case we wanted to address. The Pipes and Filters architectural pattern lets us take advantage of asynchronous programming workers for several steps of our data pipeline. The segregation of data preparation and business logic is a feature of several architectures, although the asynchronous data pipeline library we used to implement our solution has one clear and explicit limitation: it runs data pipelines on a single machine.
To scale up to big-data use cases, similar techniques are used, but instead of having a local orchestrator library we might want to use systems capable of distributing pipeline steps across several machines or applying multiprocessing to some of the steps in a pipeline...