Reader small image

You're reading from  Data Engineering with Python

Product typeBook
Published inOct 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781839214189
Edition1st Edition
Languages
Right arrow
Author (1)
Paul Crickard
Paul Crickard
author image
Paul Crickard

Paul Crickard authored a book on the Leaflet JavaScript module. He has been programming for over 15 years and has focused on GIS and geospatial programming for 7 years. He spent 3 years working as a planner at an architecture firm, where he combined GIS with Building Information Modeling (BIM) and CAD. Currently, he is the CIO at the 2nd Judicial District Attorney's Office in New Mexico.
Read more about Paul Crickard

Right arrow

Differentiating stream processing from batch processing

While the processing tools don't change whether you are processing streams or batches, there are two things you should keep in mind while processing streams – unbounded and time.

Data can be bounded or unbounded. Bounded data has an end, whereas unbounded data is constantly created and is possibly infinite. Bounded data is last year's sales of widgets. Unbounded data is a traffic sensor counting cars and recording their speeds on the highway.

Why is this important in building data pipelines? Because with bounded data, you will know everything about the data. You can see it all at once. You can query it, put it in a staging environment, and then run Great Expectations on it to get a sense of the ranges, values, or other metrics to use in validation as you process your data.

With unbounded data, it is streaming in and you don't know what the next piece of data will look like. This doesn't mean...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Engineering with Python
Published in: Oct 2020Publisher: PacktISBN-13: 9781839214189

Author (1)

author image
Paul Crickard

Paul Crickard authored a book on the Leaflet JavaScript module. He has been programming for over 15 years and has focused on GIS and geospatial programming for 7 years. He spent 3 years working as a planner at an architecture firm, where he combined GIS with Building Information Modeling (BIM) and CAD. Currently, he is the CIO at the 2nd Judicial District Attorney's Office in New Mexico.
Read more about Paul Crickard