Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning Apache Apex

You're reading from  Learning Apache Apex

Product type Book
Published in Nov 2017
Publisher
ISBN-13 9781788296403
Pages 290 pages
Edition 1st Edition
Languages
Authors (5):
Thomas Weise Thomas Weise
Profile icon Thomas Weise
Ananth Gundabattula Ananth Gundabattula
Profile icon Ananth Gundabattula
Munagala V. Ramanath Munagala V. Ramanath
Profile icon Munagala V. Ramanath
David Yan David Yan
Profile icon David Yan
Kenneth Knowles Kenneth Knowles
Profile icon Kenneth Knowles
View More author details

Table of Contents (17) Chapters

Title Page
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
Introduction to Apex Getting Started with Application Development The Apex Library Scalability, Low Latency, and Performance Fault Tolerance and Reliability Example Project – Real-Time Aggregation and Visualization Example Project – Real-Time Ride Service Data Processing Example Project – ETL Using SQL Introduction to Apache Beam The Future of Stream Processing

Chapter 3. The Apex Library

The previous chapter introduced the application development process resulting in a simple Hello World application. Now, we will introduce the Apex library and look at more meaningful functional building blocks that are used to assemble real applications.

This chapter will cover:

  • An overview of the library
  • Integration with existing infrastructure
  • Messaging systems, files, and databases as frequently used sources and sinks
  • Transformations to build the pipeline functionality

An overview of the library


The Apex Malhar library (referenced as Apex library throughout this book) contains operators as well as APIs and other components that are useful to assemble applications and build customized operators (for example, stream codecs, partitioners, state management, and windowing support). The aim of the Apex library is to provide many common building blocks as readily usable (configurable, as opposed to having to write code).

Note

The Apex library is maintained as part of the Apache Apex project in its own repository: https://github.com/apache/apex-malhar (whereas the Apex core engine is under https://github.com/apache/apex-core). Releases of the Apex library and Apex core engine are made at different frequencies, mostly because operators, which are the functional building blocks for applications, receive more contributions and evolve at a faster pace than the core engine.

The API of the engine is designed so that development can be separated and new functionality added...

Integrations


The following subsections we will cover important external system integrations for Apex applications and the corresponding connectors provided by the Apex library. The section will start with the streaming data connectors, used for continuous processing and low latency use cases. Next, we will look at the file connectors, which are frequently used, especially for batch use cases, where massive amounts of data need to be processed and the ability to read or write with high throughput in a scalable manner is important. Finally, we will look at a few database connectors.

Apache Kafka

In its own words, Apache Kafka (http://kafka.apache.org/) "is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies".

Apache Kafka is a distributed, horizontally scalable, fault tolerant and high-throughput pub-sub messaging system. In contrast to similar messaging systems, Kafka was not...

Transformations


So far, we have looked at the operators that connect Apex pipelines to the outside world, to read data from messaging systems, files, and other sources and to write results to various destinations. We have seen that the Apex library has comprehensive support to integrate various external systems with feature rich connectors.

Now it is time to look at the support available for the actual functionality of the pipeline. These building blocks are transformations: their purpose is to modify or accumulate the tuples that flow through the processing pipeline. Examples of typical transformations are parsing, filtering, aggregation by key, and join:

The preceding diagram categorizes transformations into those that are applied to individual tuples and those that aggregate tuples based on keys and windows. Often, per tuple transforms are stateless and windowed transforms require state for the accumulation. Most pipelines are composed of several of these transforms. It is common to see...

Summary


In previous chapters, we explored what Apex is and how applications are built using it. The examples were simple, Hello World style. In this chapter, we introduced the Apex library, which is of central importance to developing real-world applications. The library contains the functional building blocks that are required to integrate with existing data infrastructure and operators that implement the transformations that are frequently needed for the stream processing.

The chapter also provided links to documentation and example applications for the operators that were covered, which will be helpful as the starting point when building your own application (after all, it is much easier to start from something that works and build on top of it, as opposed to starting from scratch). Beyond functionality, we also saw how various operators support aspects such as low latency, performance, scalability, fault-tolerance, and processing guarantees that are required for production-quality applications...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Learning Apache Apex
Published in: Nov 2017 Publisher: ISBN-13: 9781788296403
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}