Service Oriented Architecture: An Integration Blueprint

3.5 (2 reviews total)
By Guido Schmutz , Peter Welkenbach , Daniel Liebhart
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Service Oriented Architecture (SOA) refers to building systems that offer applications as a set of independent services that communicate and inter-operate with each other effectively. Such applications may originate from different vendor, platform, and programming language backgrounds, making successful integration a challenging task. This book enables you to integrate application systems effectively, using the Trivadis Integration Architecture Blueprint, which is supported by real-world scenarios in which this Integration Blueprint has proved a success.

This book will enable you to grasp all of the intricacies of the Trivadis Architecture Blueprint, including detailed descriptions of each layer and component. It is a detailed theoretical guide that shows you how to implement your own integration architectures in practice, using the Trivadis Integration Architecture Blueprint. The main focus is on explaining and visualizing the blueprint, including comprehensive descriptions of all of its layers and components. It also covers the more basic features of integration concepts for less experienced specialists, as well as shedding light on the future of integration technologies, such as XTP and Grid Computing. You will learn about EII and EAI, OGSi, as well as base technologies related to the implementation of solutions based on the Blueprint, such as JCA, JBI, SCA and SDO.

The book begins by covering fundamental integration for those less familiar with the concepts and terminology, and then dives deep into explaining the different architecture variants and the future of integration technologies. Base technologies like JCA and SCA will be explored along the way, and the structure of the Trivadis Integration Architecture Blueprint will be described in detail, as will the intricacies of each component and layer. Other content includes discovering and comparing traditional and modern SOA driven integration solutions, implementing transaction strategies and process modeling, and getting to grips with EDA developments in SOA. Finally, the book considers how to map software from vendors like Oracle and IBM to the blueprint in order to compare the solutions, and ultimately integrate your own projects successfully.

Publication date:
June 2010
Publisher
Packt
Pages
240
ISBN
9781849681049

 

Chapter 1. Basic Principles

This chapter describes the fundamental concepts of integration, and is intended as an introduction to integration technology and terminology. You will:

  • Learn the basic concepts, which are often used in the context of integration architecture

  • Grasp an overview of the different architecture variants, such as point-to-point, hub-and-spoke, pipeline, and service-oriented architecture (SOA)

  • Learn about service-oriented integration with an explanation of both the process and the workflow integration patterns

  • Understand the different types of data integration and the accompanying patterns

  • Gain an understanding of Enterprise Application Integration (EAI) and Enterprise Information Integration (EII), and an indication of how direct connection, broker, and router patterns should be used

  • Understand developments in SOA resulting from the introduction of enterprise-wide events

  • Understand the integration technologies of the future: grid computing and extreme transaction processing (XTP)

 

Integration


The term integration has a number of different meanings. A fundamental knowledge of the terms and concepts of integration is an essential part of an integration architect's toolkit. There are many ways of classifying the different types of integration. From an enterprise-wide perspective, a distinction is made between application-to-application (A2A), business-to-business (B2B), and business-to-consumer (B2C) integration. Portal, function, and data integration can be classified on the basis of tiers. Another possible grouping consists of integration based on semantics.

Fundamental integration concepts include Enterprise Application Integration (EAI), Enterprise Service Bus (ESB), middleware, and messaging. These were used to define the subject before the introduction of SOA, and still form the basis of many integration projects today. EAI is, in fact, a synonym of integration. In David Linthicum's original definition of EAI, it means the unrestricted sharing of data and business processes among any connected applications. The technological implementation of EAI systems is, in most cases, based on middleware. The main base technology of EAI is messaging, giving the option of implementing an integration architecture through asynchronous communication, using messages which are exchanged across a distributed infrastructure and a central message broker.

The fundamental integration architecture variants are:

  • point-to-point

  • hub-and-spoke

  • pipeline

  • service-oriented architecture

A point-to-point architecture is a collection of independent systems, which are connected through a network.

Hub-and-spoke architecture represents a further stage in the evolution of application and system integration, in which a central hub takes over responsibility for communications.

In pipeline architecture, independent systems along the value-added chain are integrated using a message bus. The bus capability is the result of interfaces to the central bus being installed in a distributed manner through the communication network, which gives applications local access to a bus interface. Different applications are integrated to form a functioning whole by means of distributed and independent service calls that are orchestrated through an ESB and, if necessary, a process engine.

A fundamental technique for integration is the usage of design patterns. These include process and workflow patterns in a service-oriented integration, federation, population, and synchronization of patterns in a data integration, and direct connection, broker, and router patterns, which form part of EAI and EII. It is important to be familiar with all of these patterns, in order to be able to use them correctly.

The most recent integration architectures are based on concepts such as event-driven architecture, grid computing, or extreme transaction processing (XTP). These technologies have yet to be tested in practice, but they are highly promising and of great interest for a number of applications, in particular, for corporate companies and large organizations.

Concepts

The Trivadis Integration Architecture Blueprint applies a clear and simple naming to each of the individual layers. However, in the context of integration, a wide range of different definitions and terms are used, which we will explain in this chapter.

  • Application to Application (A2A): A2A refers to the integration of applications and systems with each another.

  • Business to Business (B2B): B2B means the external integration of business partners', customers', and suppliers' processes and applications.

  • Business to Consumer (B2C): B2C describes the direct integration of end customers into internal corporate processes, for example, by means of Internet technologies.

  • Integration types: Integration projects are generally broken down into integration portals, shared data integration, and shared function integration. Portals integrate applications at a user interface level. Shared data integration involves implementing integration architectures at a data level, and shared function integration at a function level.

  • Semantic integration: One example of a semantic integration approach is the use of model-based semantic repositories for integrating data, using different types of contextual information.

  • Enterprise Application Integration (EAI): EAI allows for the unrestricted sharing of data and business processes among any connected applications.

  • Messaging, publish/subscribe, message brokers, and messaging infrastructures: These are integration mechanisms involving asynchronous communication using messages, which are exchanged across a distributed infrastructure and a central message broker.

  • Enterprise Service Bus (ESB): An ESB is an integration infrastructure used to implement an EAI. The role of the ESB is to decouple client applications from services.

  • Middleware: The technological implementation of EAI systems is, in most cases, based on middleware. Middleware is also described as communication infrastructure.

  • Routing schemes: Information can be routed in different ways within a network. Depending on the type of routing used, routing schemes can be broken down into unicast (1:1 relationship), broadcast (all destinations), multicast (1:N), and anycast (1:N—most accessible).

A2A, B2B, and B2C

Nowadays, business information systems in the majority of organizations consist of an application and system landscape, which has grown gradually over time. The increasing use of standard software (packaged applications) means that information silos will continue to exist. IT, however, should provide end-to-end support for business processes. This support cannot, and must not, stop at the boundaries of new or existing applications. For this reason, integration mechanisms are needed, which bring together individual island solutions to form a functioning whole. This happens not only at the level of an individual enterprise or organization, but also across different enterprises, and between enterprises and their customers. At an organizational level, a distinction is made between A2A, B2B, and B2C integration (Pape 2006). This distinction is shown in the image below. Each type of integration places specific requirements on the methods, technologies, products, and tools used to carry out the integration tasks. For example, the security requirements of B2B and B2C integrations are different from those of an A2A integration.

Modern concepts such as the Extended Enterprise integration across organizational boundaries, (Konsynski 1993) and the Virtual Enterprise (Hardwick, Bolton 1997) can be described using a combination of the different integration terms.

Integration types

Integration projects are generally broken down into information portals, shared data integration, and shared function integration. Portals integrate applications at a user interface level. Shared data integration involves implementing integration architectures at a data level, and shared function integration at a function level.

Information portals

The majority of business users need access to a range of systems in order to be able to run their business processes. They may need to be able to answer specific questions (that is, a call center taking incoming customer calls must be able to access the latest customer data) or to initiate or implement certain business functions (that is, updating customer data). In these circumstances, employees often have to use several business systems at the same time. An employee may need to access an order system (on a host) in order to verify the status of a customer order and, at the same time, may also have to open a web-based order system to see the data entered by the customer. Information portals bring together information from multiple sources. They display it in one place so that users do not have to access several different systems (which might also require separate authentication) and can work as efficiently as possible (Kirchhof et al. 2003). Simple information portals divide the user's screen into individual areas, each of which displays the data from one backend system independently, without interacting with the others. More sophisticated systems allow for limited interaction between the individual areas, which makes it possible to synchronize the different areas. For example, if the user selects a record in one area, the other areas are updated. Other portals use such advanced integration technology that the boundaries between the portal application and the integrated application become blurred (Nussdorfer, Martin 2006).

Shared data

Shared databases, file replication, and data transfers fall in the category of integration using shared data (Gorton 2006).

  • Shared databases: Many different business systems need to access the same data. For example, customer addresses may be required in an order system, a CRM system, and a sales system. This kind of data can be stored in a shared database in order to reduce redundancy and synchronization problems.

  • File replication: Systems often have their own local data storage. This means that any centrally managed data (in a top-level system) has to be replicated in the relevant target databases, and updated and synchronized regularly.

  • Data transfers: Data transfers are a special form of data replication in which the data is transferred in files.

Shared business functions

In the same way that different business systems store redundant data, they also have a tendency to implement redundant business logic. This makes maintenance and adapting to new situations both difficult and costly. For example, different systems must be able to validate data using predefined, centrally managed business rules. It makes sense to manage such logic in a central place.

  • EAI: The term EAI is generally used to describe all the methods which attempt to simplify the process of making a connection between different systems, in order to avoid a type of "spaghetti architecture" which results from the uncontrolled use of proprietary point-to-point connections. The systems are linked together with EAI solutions, instead of a single proprietary application programming interface (API).

  • SOA: Service-oriented architecture is a term used to describe one way of implementing an enterprise architecture. SOA begins with an analysis of the business, in order to identify and structure the individual business areas and processes. This allows for the definition of services, which implement individual areas of business functionality. In an SOA, technical services are the equivalent of the specialist business areas, or functionality, in the business processes. This represents a major conceptual difference when compared with classic EAI solutions, which have a quite different focus. Their approach involves the simple exchange of data between systems, regardless of the technical semantics, and independently of any technical analysis of the processes.

Differences between EAI and SOA

In many cases, EAI solutions have only been able to fulfill the expectations placed on them to either a limited extent, or in an unsatisfactory way. This is, among other things, due to the following factors (Rotem-Gal-Oz 2007):

  • EAI solutions are generally data oriented and not process oriented.

  • EAI solutions do not address business processes. Instead, they are defined independently.

  • EAI solutions are highly complex, and because of their use of proprietary technologies, do not allow for long-term protection of investments, which is possible when using open standards.

  • EAI solutions need product-specific knowledge, which is only relevant in an EAI context, and cannot be reused in other projects.

  • In the long term, EAI solutions are almost as costly to operate as the previously mentioned "home-made" spaghetti architectures.

If EAI solutions are used in combination with web services to link systems together, this is still not the equivalent of an SOA. Although the number of proprietary connection components between the systems being linked are reduced by the use of open WS-* standards, a "real" SOA involves a more extensive architectural approach, based on a (business) process-oriented perspective on integration problems.

While EAI is data driven and puts the emphasis on application interface integration, SOA is a (business) process-driven concept, which focuses on integrating service interfaces in compliance with open standards encapsulating the differences in individual integration approaches. As a result, it removes the barrier between the data integration and application integration approaches. However, SOA has one significant problem, which is that of semantic integration. Existing web services do not provide a satisfactory answer to this problem, but they do allow us to formulate the right questions in order to identify future solutions.

Semantic integration and the role of data

The challenge represented by semantic integration is based on the following problem:

  • The representation of the data and the information contained in the data are often closely interlinked, and not separated into user data and metadata.

  • The information suffers from the missing data context; there is no meta information defining how the data needs to be interpreted.

This means that the data structure and data information (its meaning) are often not the same thing and, therefore, have to be interpreted (Inmon, Nesavich 2008).

The following example will help to make this clearer:

A date, such as "7 August 1973," forms part of the data. It is not clear whether this information is a text string or in a date format. It may even be in another format and will have to be calculated on the basis of reference data before runtime. This information is of no relevance to the user.

However, it might be important to know what this date refers to, in other words, its semantic meaning in its reference context. Is it a customer's birthday, or the date on which a record was created? This example can even be more complex.

Another example that can be interpreted differently in different contexts is the term Caesar, for instance. Depending on the context, it could be the name of a pet or the name of pet food, a well-known salad, a gambling casino, or the name of a Roman emperor.

It is clear that data without a frame of reference is lacking any semantic information, causing the data to be ambiguous and possibly useless. Ontologically-oriented interfaces, as well as adaptive interfaces, can help to create such semantic reference and will become increasingly important in the field of autonomous B2B or B2C marketplaces in the future.

One semantic integration approach is, for example, the use of model-based semantic repositories (Casanave 2007). These repositories store and maintain implementation and integration designs for applications and processes (Yuan et al. 2006). They access existing vocabularies and reference models, which enable a standardized modeling process to be used. Vocabularies create a semantic coupling between data and specific business processes, and it is through these vocabularies that the services and applications involved are supplied with semantic information in the surrounding technical context. The primary objective of future architectures must be to maintain the glossary and the vocabularies, in order to create a common language and, therefore, a common understanding of all the systems and partners involved. Semantic gaps must be avoided or bridged wherever possible, for example transforming input and output data by using canonical models and standardized formats for business documents. These models and formats can be predefined for different industries as reference models [EDI (FIPS 1993), RosettaNet (Damodaran 2004), and so on]. Transformation rules can be generated and stored on the basis of reference models, in the form of data cards and transformation cards. In the future, there will be a greater focus on the declarative description (what?) and less emphasis on describing the concrete software logic (how?) when defining integration architectures. In other words, the work involved in integration projects will move away from implementation, and towards a conceptual description in the form of a generative approach, where the necessary runtime logic is generated automatically.

Enterprise Application Integration (EAI)

The term Enterprise Application Integration (EAI) has become popular with the increased importance of integration, and with more extensive integration projects. EAI is not a product or a specific integration framework, but can be defined as a combination of processes, software, standards, and hardware that allow for the end-to-end integration of several enterprise systems, and enable them to appear as a single system (Lam, Shankararaman 2007).

Note

Definition of EAI

The use of EAI means the unrestricted sharing of data and business processes among any connected applications (Linthicum 2000).

From a business perspective, EAI can be seen as the competitive advantage that a company acquires when all its applications are integrated into one consistent information system. From a technical perspective, EAI is a process in which heterogeneous applications, functions, and data are integrated, in order to allow the shared use of data and the integration of business processes across all applications. The aim is to achieve this level of integration without major changes to the existing applications and databases, by using efficient methods that are cost and time effective.

In EAI, the focus is primarily on the technical integration of an application and system landscape. Middleware products are used as the integration tools, but, wherever possible, the design and implementation of the applications are left unchanged. Adapters enable information and data to be moved across the technologically heterogeneous structures and boundaries. The service concept is lacking, as well as the reduction of complexity and avoidance of redundancy offered by open standards. The service concept and the standardization only came later with the emergence of service-oriented architectures (SOA), which highlighted the importance of focusing on the functional levels within a company, and its business processes.

Nowadays, software products which support EAI are often capable of providing the technical basis for infrastructure components within an SOA. As they also support the relevant interfaces of an SOA, they can be used as the controlling instance for the orchestration, and help to bring heterogeneous subsystems together to form a whole. Depending on its strategic definition, EAI can be seen as a preliminary stage of SOA, or as a concept that competes with SOA.

SOA is now moving the concept of integration into a new dimension. Alongside the classic "horizontal" integration, involving the integration of applications and systems in the context of an EAI, which is also of importance in an SOA, SOA also focuses more closely on a "vertical" integration of the representation of business processes at an IT level (Fröschle, Reinheimer 2007).

SOAs are already a characteristic feature of the application landscape. It is advisable when implementing new solutions to ensure that they are SOA-compliant, even if there are no immediate plans to introduce an integration architecture, or an orchestration layer. This allows the transition to an SOA to be made in small, controllable steps, in parallel with the existing architecture and on the basis of the existing integration infrastructure.

Levels of integration

Integration architectures are based on at least three or four integration levels (after Puschmann, Alt 2004 and Ring, Ward-Dutton 1999):

  • Integration on data level: Data is exchanged between different systems. The technology most frequently used for integration at data level is File Transfer Protocol (FTP). Another widespread form of data exchange is the direct connection of two databases. Oracle databases, for example, exchange data via database links or external tables.

  • Integration on object level: Integration on object level is based on data-level integration. It allows systems to communicate by calling objects from outside the applications involved.

  • Integration on process level: Integration on process level uses workflow management systems. At this level, communication between the different applications takes place through the workflows, which make up a business process.

Messaging

Message queues were introduced in the 1970s as a mechanism for synchronizing processes (Brinch Hansen 1970). Message queues allow for persistent messages and, therefore, for asynchronous communication and the guaranteed delivery of messages. Messaging decouples the producer and the consumer with the only common denominator being the queue.

The most important properties of messaging, quality attributes of messaging, are shown in the following table:

Attribute

Comment

Availability

Physical queues with the same logical name can be replicated across several server instances. In the case of a failure of one server, the clients can send the message to another.

Failure handling

If communication between a client and a server fails, the client can send the message via failover mechanisms to another server instance.

Modifiability

Clients and servers are loosely coupled by the messaging concept, which means that they do not know each other. This makes it possible for both clients and servers to be modified without influencing the system as a whole. Another dependency between producer and consumer is the message format. This dependency can be reduced or removed altogether by introducing a self-descriptive general message format (canonical message format).

Performance

Messaging can handle several thousands of messages per second, depending on the size of the messages and the complexity of the necessary transformations. The quality of service also has a major influence on the overall performance. Non-reliable messaging, which involves no buffering provides better performance than reliable messaging, where the messages are stored (persisted) in the filesystem or in databases (local or remote), to ensure that they are not lost if a server fails.

Scalability

Replication and clustering make messaging a highly scalable solution.

Publish/subscribe

Publish/subscribe represents an evolution of messaging (Quema et al. 2002). A subscriber indicates, in a suitable form, its interest in a specific message or message type. The persistent queue guarantees secure delivery. The publisher simply puts its message in the message queue, and the queue distributes the message itself. This allows for many-to-many messaging:

The most important properties of publish/subscribe, quality attributes of publish/subscribe, are listed in the following table:

Attribute

Comment

Availability

Physical topics with the same logical name can be replicated across several server instances. In the case of the failure of one server, the clients can send the message to another.

Failure handling

In the case of the failure of one server, the clients can send the message to another replicated server.

Modifiability

The publisher and the subscriber are loosely coupled by the messaging concept, which means that they do not know each other. This makes it possible for both publisher and subscriber to be modified without influencing the system as a whole. Another dependency is the message format. This can be reduced or removed altogether by introducing a self-descriptive, general message format (canonical message format).

Performance

Publish/subscribe can process thousands of messages per second. Non-reliable messaging is faster than reliable messaging, because reliable messages have to be stored locally. If a publish/subscribe broker supports multicast/broadcast protocols, several messages can be transmitted to the subscriber simultaneously, but not serially.

Scalability

Topics can be replicated across server clusters. This provides the necessary scalability for very large message throughputs. Multicast/broadcast protocols can also be scaled more effectively than point-to-point protocols.

Message brokers

A message broker is a central component, which is responsible for the secure delivery of messages (Linthicum 1999). The broker has logical ports for receiving and sending messages. It transports messages between the sender and the subscriber, and transforms them where necessary.

The most important tasks of a message broker, as shown in the preceding diagram, are implementing a hub-and-spoke architecture, the routing, and the transformation of messages.

  • Hub-and-spoke architecture: The broker acts as a central message hub with the senders and receivers arranged like spokes around it. Connections to the broker are done through adapter ports that support the relevant message format.

  • Message routing: The broker uses processing logic to route the messages. Routing decisions can be hardcoded, or can be specified in a declarative way. They are often based on the content of the message (content-based routing) or on specific values or attributes in the message header (attribute-based routing).

  • Message transformation: The broker logic transforms the message input format into the necessary message output format.

The most important properties of a message broker, quality attributes of a message broker, are listed in the following table:

Attribute

Comment

Availability

To provide high availability, brokers must be replicated and operate in a clusters.

Failure handling

Brokers have different types of input ports that validate incoming messages to ensure that they have the correct format, and reject those with the wrong format. If one broker fails, the clients can send the message to another replicated broker.

Modifiability

Brokers separate transformation logic and routing logic from one another and from senders and receivers. This improves modifiability, as the logic has no influence on senders and receivers.

Performance

Because of the hub-and-spoke approach, brokers can potentially be a bottleneck. This applies in particular in the case of a high volume of messages, large messages and complex transformations. The throughput is typically lower than with simple reliable messaging.

Scalability

Broker clusters allow for high levels of scalability.

Messaging infrastructure

A messaging infrastructure provides mechanisms for sending, routing, and converting data, between different applications running on different operating systems with different technologies, as shown in the following diagram (Eugster et al. 2003):

A messaging infrastructure involves the following parties/components:

  • Producer: An application which sends messages to a local queue.

  • Consumer: An application which is interested in specific messages.

  • Local queue: The local interface of the messaging infrastructure. Each message sent to a local queue is received by the infrastructure and routed to one or more receivers.

  • Intermediate queue: In order to ensure that messages are delivered, the infrastructure uses intermediate queues, in case a message cannot be delivered, or has to be copied for several receivers.

  • Message management: Message management includes sending, routing, and converting data, together with special functions, such as guaranteed delivery, message monitoring, tracing individual messages, and error management.

  • Event management: The subscription mechanism is controlled through special events.

Enterprise Service Bus

An Enterprise Service Bus is an infrastructure that can be used to implement an EAI. The primary role of the ESB is to decouple client applications from services, as shown in the following diagram (Chappell 2004):

The encapsulation of services by the ESB means that the client application does not need to know anything about the location of the services, or the different communication protocols used to call them. The ESB enables the shared, enterprise-wide, and even intra-enterprise use of services and separate business processes from the relevant service implementations (Lee et al. 2003).

The core functions of an ESB

The major SOA vendors now offer specific Enterprise Service Bus products, which provide a series of core functions in one or another form, shown in the following diagram:

The structure of an ESB

The following diagram shows the basic structure of an ESB in a vendor-neutral way:

The naming for the single components used by the different vendors of SOA products will vary from those shown in the above diagram, but the products provide the following functions as a minimum (Craggs 2003):

  • Routing and messaging as base services

  • A communication bus, which enables a wide variety of systems to be integrated using predefined adapters

  • Transformation and mapping services for a range of different conversions and transformations

  • Mechanisms for executing processes and rules

  • Monitoring functions for a selection of components

  • Development tools for modeling processes, mapping rules, and message transfers

  • A series of standardized interfaces, such as JMS (Java Messaging Specification (Hapner et al. 2002)), JCA (Java Connector Architecture (JCASpec 2003)), and SOAP/HTTP

Middleware

In most cases the technological realization of EAI systems is done through what is commonly termed middleware. Middleware is also described as a communication infrastructure. It allows communication between software components, regardless of the programming language in which the components were created, the protocols used for communication, and the platform on which the components are running (Thomson 1997). A distinction is made between the different types of middleware according to the methods of communication used, and the base technology.

Middleware communication methods

Communication methods for middleware can be broken down into five categories:

  • Conversational (Dialog-Oriented): The components interact synchronously with one another. They always react instantly to the information being exchanged. This type of communication is generally used in real-time systems.

  • Request/reply: This is used when an application needs to call functions from another application. It corresponds to a call to a subroutine, with the important difference that the communication can take place over a network.

  • Message passing: This enables information to be exchanged in a simple and well-directed way using messages. Communication takes place in one direction only. If an application wants to react to an incoming message, its response must be placed in another message.

  • Message queuing: Information is exchanged in the form of messages which are sent through a queue, in other words, indirectly. Queuing allows the secure, planned, and prioritized delivery of messages. It is often used for the near real-time exchange of information between loosely coupled systems.

  • Publish/subscribe: Two roles are involved in non-directed communication: the publisher of a message sends the message only to the middleware. The subscriber subscribes to all the types of message that are of interest to him or her. The middleware ensures that all subscribers receive the corresponding messages from a publisher.

Middleware type

Communication

Relationship

Synchronous/ asynchronous

Interaction

Peer-to-peer, API

Conversational

1:1

Synchronous

Blocking

Database gateways

Request/reply

1:1

Synchronous

Blocking

Database replication

Request/reply/

Message queue

1:N

1:N

Synchronous

Asynchronous

Blocking

Non-blocking

Remote procedure calls

Request/reply

1:1

Mostly synchronous

Mostly blocking

Object request brokers

Request/reply

1:1

Mostly synchronous

Mostly blocking

Direct messaging

Message passing

1:1

Asynchronous

Non-blocking

Message queue systems

Message queue

M:N

Asynchronous

Non-blocking

Message infrastructure

Publish/subscribe

M:N

Asynchronous

Non-blocking

Middleware base technologies

Middleware can be broken down into the following base technologies:

  • Data-oriented middleware: The integration or distribution of data to different RDBMS using the appropriate synchronization mechanisms.

  • Remote procedure call: The implementation of the classic client/server approach.

  • Transaction-oriented middleware: The transaction concept (ACID—Atomicity, Consistency, Isolation, Durability) is put into effect using this type of middleware. A transaction is a finite series of atomic operations which have either read or write access to a database.

  • Message-oriented middleware: The information is exchanged by means of messages, which are transported by the middleware from one application to the next. Message queues are used in most cases.

  • Component-oriented middleware: This represents different applications and their components as a complete system.

Routing schemes

Information can be routed in different ways within a network. Depending on the type of routing used, routing schemes can be broken down into the following four categories:

  • Unicast (1:1 relationship)

  • Broadcast (all destinations)

  • Multicast (1:N)

  • Anycast (1:N, most accessible)

Unicast

The unicast routing scheme sends data packages to a single destination. There is a 1:1 relationship between the network address and the network end point:

Broadcast

The broadcast routing scheme sends data packets in parallel to all the possible destinations in the network. If there is no support for this process, the data packets can be sent serially to all possible destinations. This produces the same results, but the performance is reduced. There is a 1:N relationship between the network address and the network end point.

Multicast

The multicast routing scheme sends data packets to a specific selection of destinations. The destination set is a subset of all the possible destinations. There is a 1:N relationship between the network address and the network end point:

Anycast

The anycast routing scheme distributes information to the destination computer which is nearest, or most accessible. There is a 1:N relationship between the network address and the network end point, but only one end point is addressed at any given time for the purpose of routing the information.

 

Integration architecture variants


The fundamental integration architecture variants are:

  • Point-to-point architecture: A collection of independent systems which are connected through a network.

  • Hub-and-spoke architecture: A further stage in the evolution of application and system integration, in which a central hub takes over responsibility for communications.

  • Pipeline architecture: In pipeline architecture, independent systems along the value-added chain are integrated using a message bus. The bus capability results in the distribution of the interfaces to the central bus throughout the communication network, which gives applications a local access to a bus interface.

  • Service-oriented architecture: The integration of different applications to form a functioning whole by means of distributed and independent service calls, which are orchestrated through an ESB and, if necessary, a Process Engine.

Point-to-point architecture

A point-to-point architecture is a collection of independent systems which are connected through a network. All the systems have equal rights, and can both use and provide services (Lublinsky 2002). This architecture can be found in many organizations, where application islands that have grown through time have been connected directly to each other.

As shown in the above diagram, in this architecture, there is no central database—each system has its own data storage.

New systems are connected directly with the existing ones, which over time leads to a highly complex set of interfaces. A point-to-point architecture with n applications can in theory have n*(n-1)/2 interfaces.

It is easy to imagine how complex, error-prone, and difficult it can be to maintain such an architecture as more and more applications are added. Expanding the system is costly and, as the number of interfaces grows, operation becomes increasingly time consuming and expensive. A SWOT analysis is shown in the following table:

Strengths

Weaknesses

  • Low startup and infrastructure costs

  • Autonomous systems

  • Only practical if there are a few systems and a few connections

  • Replacing individual systems is a highly laborious and costly process

  • Very inflexible, not the base for an SOA and, therefore, it is difficult to represent business processes

  • No overview of data

  • Limited reusability of components

  • Time consuming and costly operation

Opportunities

Threats

  • Functions within the systems can be rapidly adapted to meet new requirements

  • High follow-up costs

  • Lack of standardization

Hub-and-spoke architecture

Hub-and-spoke architecture represents a further stage in the evolution of application and system integration, as shown in the following diagram (Gilfix 2003):

Its objective is to minimize the growing interface complexity by using a central integration platform to exchange messages between the systems. The central integration platform can transform messages, route them from one application to the next, and change the content of the messages. This architecture is often used for complex data distribution mechanisms. A SWOT analysis is shown in the following table:

Strengths

Weaknesses

  • Reduction of the interface problem

  • Low follow-up costs

  • Compliance with standards

  • Autonomous systems

  • Simplified monitoring

  • High startup and infrastructure costs

Opportunities

Threats

  • Individual systems can be integrated/replaced easily

  • With high transfer volumes, the central hub could become a performance bottleneck

  • Single point of failure

Pipeline architecture

In a pipeline architecture, independent systems along the value-added chain are integrated using a message bus, as in the following figure. The implementation of this architecture corresponds to that of the hub-and-spoke architecture, as the corresponding middleware products are normally installed and operated on central servers. The bus capability results in the distribution of the interfaces to the central bus throughout the communication network, which generally also gives applications local access to a bus interface (Ambriola, Tortora 1993).

Similarly to the hub-and-spoke architecture, this architecture also keeps interface problems to a minimum. The use of appropriate middleware components allows the communication between the systems to be standardized. The bus system is responsible for message distribution. The transformation and routing rules are stored in a central repository. Depending on the middleware product in use, business functions and rules can also be represented. A SWOT analysis is shown in the following table:

Strengths

Weaknesses

  • Low follow-up costs

  • Very flexible architecture

  • Compliance with standards

  • Autonomous systems

  • High startup and infrastructure costs

Opportunities

Threats

  • Individual systems can be integrated/replaced easily

  • With high transfer volumes, there is the risk of a performance bottleneck, if it is not separated from normal traffic (for example, separate bulk channel)

This form of architecture is ideal for:

  • Very high performance requirements (event-driven architecture)

  • 1:N data distribution (for example, broadcasting)

  • N:1 database (for example, data warehouse)

Service-oriented architecture

The core of a service-oriented architecture, and the main distinction between this form of architecture and those described earlier, is the fact that business processes and applications are no longer coded as complex program structures, but are orchestrated as independent, distributed service calls.

An ESB is used as the central integration component for service calls. It has similar properties to those of the integration platform in hub-and-spoke architecture, or of the bus in pipeline architecture. A SWOT analysis is shown in the following table

Strengths

Weaknesses

  • Low follow-up costs

  • Very flexible architecture

  • Compliance with standards

  • Supported by all major software houses

  • High startup and infrastructure costs

  • Requires a comprehensive SOA strategy and governance

Opportunities

Threats

  • Individual systems can be implemented and orchestrated easily

  • Lack of focus on relevant business processes

 

Patterns for EAI/EII


Three basic patterns are used for the implementation of EAI and EII platforms:

  • Direct connection

  • Broker

  • Router

Direct connection

Direct connection represents the simplest type of interaction between two applications and is based on a 1: N topology, in other words, an individual point-to-point connection. It allows a pair of applications within an organization to communicate directly. Interactions between the source and the target applications can be as complex as necessary. Additional connection rules are defined for more complex point-to-point connections. Examples of connection rules include data mapping rules, security rules, and availability rules.

The direct connection pattern can be broken down into the following logical components:

  • The source applications consist of one or more applications, which want to initiate interaction with the target applications.

  • The connection is the line between the source and the target application, and represents a point-to-point connection between the two applications.

  • Connection rules are the business rules which relate to the connection, such as data mapping and security rules.

  • The target application is a new or existing (modified or unmodified) application, which provides the necessary business services.

The advantages and disadvantages of the direct connection pattern are shown in the following table:

Advantages

Disadvantages

  • Functions well in the case of applications with simple integration requirements and only a few backend applications

  • Loose coupling

  • Receivers do not need to be online

  • Results in several point-to-point connections between each pair of applications, and therefore, to spaghetti configurations

  • Does not support the intelligent routing of queries

  • Does not support the decomposition/ re-composition of queries

Uses

Direct connection is used for the following purposes:

  • Reducing the latency of business events

  • Supporting the structured exchange of information within an organization

  • Supporting real-time one-way message flows

  • Supporting real-time request/reply message flows

  • Continued use of legacy investments

Broker

The broker pattern is based on the direct connection pattern, and extends it to a 1: N topology. It allows an individual request from a source application to be routed to several target applications, which reduces the number of 1:1 connections required. The connection rules take the form of broker rules. This allows the distribution rules to be kept separate from the application logic (Separation of Concerns principle or SoC). The broker is also responsible for the composition and decomposition of interactions. The broker pattern uses the direct connection pattern for the connection between the applications. The broker pattern forms the base for the publish/subscribe message flow:

The broker pattern can be broken down into the following logical components:

  • The source applications consist of one or more applications which want to interact with the target applications.

  • The broker component keeps the number of direct connections to a minimum. It also supports message routing, message enhancement, and the transformation, decomposition, and re-composition of messages.

  • The target applications consist of both new and existing (modified or unmodified) applications. These applications are responsible for implementing the necessary business services.

The advantages and disadvantages of the broker pattern are shown in the following table:

Advantages

Disadvantages

  • Allows for the interaction of several different applications.

  • Minimizes the impact on existing applications.

  • Makes routing services available, so that the source application no longer needs to know the target applications.

  • Provides transformation services, which enable the source and target applications to use different communication protocols.

  • Decomposition/re-composition services are available to allow a single request to be sent from one source to several target applications.

  • The use of the router keeps the number of necessary modifications to a minimum when the location of the target application is changed.

  • Logic has to be implemented on the broker for routing and decomposition/ re-composition tasks.

Uses

Broker is used for the following purposes:

  • An individual application should be able to interact with one or more target applications.

  • A hub-and-spoke architecture reduces complexity when compared with a point-to-point architecture.

  • The externalization of the routing, decomposition, and re-composition rules increases maintainability and flexibility.

  • Broker pattern is important when a request is processed from a source application and results in several interactions with the target systems.

  • The source system is decoupled from the target applications, and there is no dependency on the interfaces of these target applications.

Router

The router pattern is a variant of the broker pattern with several potential target applications, in which the message is always routed to only one target application. The router decides which target application will receive the interaction. While the broker pattern supports 1:N connections, the router pattern only allows 1:1 connections, as the router rules determine the target application in each case.

The router pattern as shown in the diagram can be broken down into the following logical components:

  • The source applications consist of one or more applications that want to interact with the target applications.

  • The router component provides all the business rules needed for processing the message, such as routing and transformation. It receives requests from several source applications, and routes them intelligently to the correct target application. The resulting integration is, in fact, a point-to-point connection between the source and the target.

  • The target applications consist of both new and existing (modified or unmodified) applications. These applications are responsible for implementing the necessary business services.

The advantages and disadvantages of the router pattern are shown in the following table:

Advantages

Disadvantages

  • Allows for the interaction of several different applications.

  • Minimizes the impact on existing applications.

  • Makes routing services available, so that the source application no longer needs to know the target applications.

  • Provides transformation services, which enable the source and target applications to use different communication protocols.

  • The use of the router keeps the number of necessary modifications to a minimum when the location of the target application is changed.

  • No decomposition and re-composition of messages.

  • No possibility of sending several simultaneous requests to the target applications on the basis of the incoming request.

Uses

Router is used for the following purposes:

  • An individual application should be able to interact with one of several target applications

  • A hub-and-spoke architecture reduces complexity when compared with a point-to-point architecture

  • The externalization of the routing, decomposition, and re-composition rules increases maintainability and flexibility

  • Router pattern is important when a request is processed from a source application and results in an interaction with only one of several potential target systems

  • As with the Broker pattern, the source system is also decoupled from the target applications, and has no dependency on the interfaces of these applications

 

Patterns for data integration


Data integration is implemented using three fundamental patterns:

  • Federation

  • Population

  • Synchronization

Federation

The federation pattern is a simple data integration pattern that provides access to different data sources, and gives the calling application the impression that these sources are a single, logical data source. This is achieved as follows:

  1. 1. Expose a single consistent interface to the application.

  2. 2. Translate the interface to whatever interface is needed for the underlying data.

  3. 3. Compensate for any differences in function between the different data sources.

  4. 4. Allow data from different sources to be combined into a single result set that is returned to the user.

This is illustrated in the following diagram:

The federation pattern as shown in this diagram can be broken down into the following logical building blocks:

  • The calling applications have the need for information, but they don't possess the information.

  • The federation building block uses metadata to determine where the data required is stored, and in what format. The metadata repository allows the decomposition of a single query executed against the federation building block, into individual requests to different data sources. To the user (the calling application), the information model appears to be a single virtual repository. The data is accessed via suitable adapters for each target repository. The federation component sends an individual result to the calling application, and integrates several different formats into a shared federated schema.

  • The source applications have the information that is important for the calling applications.

The federation pattern supports structured and unstructured data, together with read-only and read/write accesses to the underlying data sources. Read/write accesses should be limited, wherever possible, to a single data source, as otherwise a two-phase commit is needed, which can be difficult in distributed databases.

Uses

Federation is used for the following purposes:

  • The data needed by an application is distributed across different databases (for historic, technical, or organizational reasons)

  • Federation is more effective than other data integration technologies, when:

    • Near real-time access is needed for rapidly changing data

    • Making a consolidated copy of the data is not possible for technical, legal, or other reasons

    • Read/write access must be possible

    • Reducing or limiting the number of copies of the data is a goal

  • It is possible to continue to make use of existing investments

Population

The population pattern has a very simple model. It gathers data from one or more data sources, processes the data in an appropriate way, and applies it to a target database. In its simplest form, the population pattern is based on the read dataset-process data-write dataset model. This corresponds to the classic ETL (Extract, Transform, and Load process.

This is illustrated in the following diagram:

The population pattern can be broken down into the following logical components:

  • The target applications have a need for information, which they do not possess. Therefore, a copy from another data source in a source application is required.

  • The population component reads one or more data sources in the source application, and writes the data to a data source in the target application. The rules for extracting data from the source application can be as complex as necessary. They range from simple rules, such as read all data, to more complex rules where only specific fields in specific records can be read under certain conditions. The loading rules for the target database can vary from a simple overwrite of the data, to a more complex process of inserting new records and updating existing ones. The metadata is used to describe these rules.

  • The source applications have the important information needed by the target applications.

Uses

Population is used for the following purposes:

  • A specialized copy of existing data (derived data) is needed:

    • Subsets of existing data sources

    • A modified version of an existing data source

    • Combinations of existing data sources

  • Only read access to the derived data in the target application is possible (or only a few write accesses).

  • In the case of a significant number of write accesses, the two-way synchronization pattern should be used.

  • The user must be provided with quick access to the information required, instead of being bombarded with too much, irrelevant, incorrect, or otherwise useless misinformation.

  • However, IT drivers often dictate the use of the population pattern. In other words, the copies of data are made for technical reasons. These drivers include:

    • Improved performance of user access

    • Load distribution across systems

Synchronization

The synchronization pattern (also known as the replication pattern) enables bidirectional update flows of data in a multi-copy database environment. The "two-way" synchronization aspect of this pattern is what distinguishes it from the "one-way" capabilities provided by the population pattern.

This is illustrated in the following diagram:

The synchronization pattern shown in this diagram can be broken down into the following logical components:

  • The target applications have a need for information, which they do not possess. Therefore, a copy from another data source in a source application is required.

  • At a simplistic level, the synchronization component can be compared to the population pattern, with the only difference being that the data flows in both directions. If the data elements flowing in both directions are fully independent, then two-way synchronization is no more than two separate instances of the population pattern. However, it is more common to find some overlap between the datasets flowing in either direction. In this case, conflict detection and resolution are needed.

  • The source applications have information which is relevant to the target applications.

Uses

Synchronization is used for the following purpose:

  • A specialized copy of existing data (derived data) is needed. This copy can take different forms:

    • Subsets of existing data sources

    • A modified version of an existing data source

    • Combinations of existing data sources

Multi-step synchronization

There is one variant of the synchronization pattern: the multi-step variant. The multi-step variant of the two-way synchronization pattern makes use of one instance of the population pattern, with its gather, process, and apply functions, for each of the two synchronization directions. An additional "reconcile" function is placed between the two data flows, and guarantees that there are no conflicts in the updates. If the opportunities for conflicts are minimal, this pattern can be constructed from existing population components. However, a specialized solution should be used for more complex situations.

The following diagram illustrates the "reuse" of the population pattern, once for each direction with the additional "reconcile" component in the middle.

 

Patterns for service-oriented integration


Service-oriented integration is based on two fundamental patterns:

  • Process integration: The process integration pattern extends the 1: N topology of the broker pattern. It simplifies the serial execution of business services, which are provided by the target applications.

  • Workflow integration: This is basically a variant of the serial process pattern. It extends the capability of simple serial process orchestration to include support for user interaction during the execution of individual process steps.

Process integration

The process integration pattern extends the 1: N topology of the broker pattern seen in EAI. It simplifies the serial execution of business services, which are provided by the target applications, and therefore enables the orchestration of serial business processes, based on the interaction of the source application. The serial sequence is defined using process rules, which allows for decoupling from the process logic (flow logic and the domain logic) of the individual application. The rules define not only the control and data flow, but also the permitted call rules for each target application. Interim results (process data) are stored in individual results databases.

The process integration pattern can be broken down into three building blocks:

  • The source applications consist of one or more applications that want to interact with the target applications.

  • The serial process rules support the same services as the broker in the broker pattern, including routing queries, protocol conversion, message broadcasting, and message decomposition and re-composition. In addition, externalization of the process flow logic from the individual applications is also supported. The process logic is determined by serial process rules which, together with the control and data flow rules, define the execution rules for each target application. These rules are stored in a process rules database.

  • The target applications consist of both new and existing (modified or unmodified) applications. These applications are responsible for implementing the necessary business services.

The advantages and disadvantages of the process integration pattern are shown in the following table:

Advantages

Disadvantages

  • Improves the flexibility and responsiveness of an organization by implementing end-to-end process flows and externalizing process logic from individual applications.

  • Provides a foundation for Business Process Management that enables the effectiveness of business processes to be monitored and measured.

  • Only direct, automatic processing supported. No user interaction is possible (refer to the workflow variant).

  • No parallel processing possible.

Uses

Process integration is used for the following reasons:

  • Support for end-to-end process flows which use the services provided by the target applications

  • Improves the flexibility and responsiveness of IT by externalizing process logic from individual applications

Variants

There are two variants of this pattern:

The parallel process pattern extends the simple serial process orchestration provided by the serial process patterns, by supporting concurrent execution and orchestration of business service calls. The concurrent execution of sub-processes requires the sub-steps to be split up and brought together, so that they can be executed in parallel. Different patterns are available for this purpose at an implementation level (for example, patterns for parallel computing and different architecture styles (for example, pipes-and-filters architectures). The interim results of a sub-step may or may not influence the overall results. It is also possible for the interim results of a sub-step to influence the execution of other sub-steps.

The external business rules variant adds the option of externalizing business rules from the serial process, into a business rule engine, where they can be evaluated. The process only reacts to the responses of the rule engine. The complex rule evaluations are carried out by the specialized rule engine. Externalizing the rules improves flexibility and responsiveness, because the business rules can be adapted much more easily and quickly.

Workflow integration

The workflow integration pattern represents an extension of the process integration pattern, as illustrated in the following diagram:

It extends the capability of simple serial process orchestration to include support for user interaction during the execution of individual process steps. As a result, it supports a classic workflow.

Variants

The parallel workflow integration pattern is a variant of the workflow integration pattern, and corresponds to the parallel process integration pattern which forms part of the process integration pattern. It extends the capability of parallel process orchestration to include support for user interaction during the execution of individual process steps. As a result, it supports a parallel workflow.

 

Event-driven architecture


Event-driven architecture (EDA) is one of the hot topics of the industry. These architectures are often wrongly referred to as the successors to SOAs (Mühl et al. 2006). In fact, the concepts involved in EDA are as old as IT itself. In addition, EDAs are growing rapidly in popularity, together with the integration architectures of SOA. However, both types of architecture can be used completely independently of one another, and can be combined orthogonally. From the perspective of integration, two aspects of EDA are of particular interest:

  • The symbiosis between EDA and SOA that has already been referred to, which allows SOA domains to be linked/integrated together on an event-driven basis.

  • The technology offered by EDA which enables events from one or more event streams on the data integration level to be consolidated into new information.

Introducing EDA

According to a study by Gartner (Gartner 2006), the success of companies such as Dell and Google is due to the fact that these organizations are able to identify market factors or market events in the global marketplace at an early stage, and follow them up consistently and quickly. Both examples are very close to the picture drawn by Gartner of an ideal organization: the real-time enterprise (RTE). An RTE is characterized by its highly-automated business processes and the shortest possible process runtimes (Nussdorfer, Martin 2003).

While SOA concepts within IT structures form the basis for the automation of business processes, the second step, which relates to the ideal image of the RTE, involves processing more fine-grained information about changes in the state of these business processes. The complexity of these state changes is increasing noticeably, in the same way that the number of reaction interfaces in the business processes is. This is where the significance of EDA lies, because the observable changes in the state of the business processes can be modeled as events. Classic integration architectures, such as OLTP (Online Transaction Processing) ) or OLAP (Online Analytical Processing) are no longer able to meet the requirements for rapid and consistent action on the basis of event analyses (Zeidler 2007).

The above diagram illustrates the symbiosis between EDA and SOA, which is often referred to as SOA 2.0 (Carter 2007) or Next Generation SOA (Luckham 2002). An SOA or an SOA domain provides the technical services, independently of consumers. These services can be combined or orchestrated, and they form the building blocks for the business processes. If a service of this kind triggers an event, for example because of a change in its state, an orthogonal EDA extension can activate a new SOA domain. As a result, a service becomes an event-producing building block in an EDA. In contrast to the typical producer/consumer patterns of an SOA, the EDA largely uses a publish/subscribe mechanism. An event processor processes the events as they occur, and publishes the processed results via an event channel, which triggers the services of other SOA domains. Various types of event processor are used depending on the type of event processing required. These include Complex Event Processors (CEP), for example, which are described later in this chapter.

The SOA domains (to be integrated) should ideally be defined in such a way that they represent reusable services which can be used several times in business process chains of any length. The principle of loose coupling for the formation of such flexible business process chains is of decisive importance in the EDA, in the same way as it is in the SOA.

Event processing

The second aspect of integration that we want to highlight is of a more technical nature. It concerns the possibilities for event processing within an EDA concept, as shown in the following figure:

Event-processing technologies have been in day-to-day use in many industries for several years. Examples include algorithmic trading in stock markets and Radio Frequency Identification (RFID) in road charging systems.

There are three fundamental types of event processing:

  • Simple Event Processing (SEP)

  • Event Stream Processing (ESP)

  • Complex Event Processing (CEP)

Simple Event Processing (SEP)

Events occur either individually, or in streams. Single events can be regarded as an important change in state in a message source and, in particular, in a business event. Events of this kind typically trigger processes in the systems which receive the message. This form of event processing corresponds exactly with the specification of the Java Messaging Service (JMS). Therefore, a typical example of SEP is JMS.

Event Stream Processing (ESP)

ESP involves processing streams of incoming messages or events. Typical ESP systems have sensors which channel a large number of events, and use filters and other processing methods to influence the stream of messages or events. Individual events are less important, and instead the focus is on the event stream. Well-known examples include the systems which track stock market prices: one single fluctuation in prices is generally not particularly significant. A more informative overall trend can only be determined from several events.

Complex Event Processing (CEP)

The third form of event processing is Complex Event Processing, which is part of ESP. In CEP there is a strong focus on identifying patterns in a large number of events and their (message) contents, which may be distributed across different data streams.

The CEP funnel model is illustrated in the following figure:

The CEP funnel model illustrates the process of compressing (large) volumes of events to produce compressed information. The source of the events includes business events (Viehmann 2008). One of the classic uses of CEP is in tracing credit card fraud. In the case of two transactions using the same credit card, which were made within a short period of time, in locations a long distance apart, this type of geographical and chronological pattern indicating the possibility of fraud, can easily be applied to the model. Systems of this kind, which correlate thousands of events in order to filter out the few cases of fraud or misbehavior, are more and more frequently in use.

 

Grid computing/Extreme Transaction Processing (XTP)


Grid computing and XTP are the new integration technologies, which are likely to become increasingly popular over the next few years.

  • Grid computing: An infrastructure for the integrated, collaborative use of resources. Grids can be broken down into Data Grids, In-Memory Data Grids, Domain Entity Grids, and Domain Object Grids on the basis of their primary functionality, and are used in a wide range of applications.

  • XTP: This is a distributed storage architecture, which allows for parallel application access. It is designed for distributed access to large, and very large, volumes of data.

Grid computing

Grid computing is the term used to describe all the methods that combine the computing power of a number of computers in a network, in a way that enables the (parallel) solution of compute-intensive problems (distributed computing), in addition to the simple exchange of data. Every computer in the grid is equal to all the others. Grids can exceed the capacity and the computing power of today's super computers at considerably lower cost, and are also highly scalable. The computing power of the grid can be increased by adding computers to the grid network, or combining grids to create meta grids.

Note

Definition of a grid

A grid is an infrastructure enabling the integrated, collaborative use of resources which are owned and managed by different organizations (Foster, Kesselmann 1999).

The following diagram illustrates the basic model of grid computing, with the network of computers forming the grid in the middle:

The main tasks of grids are:

  • Distributed caching and processing: Data is distributed across all the nodes in a grid. Different distribution topologies and strategies are available for this purpose. A data grid can be divided into separate sub-caches, which allows for the use of more effective access mechanisms involving pre-filtering. The distribution of the data across different physical nodes guarantees the long-term availability and integrity of the data, even if individual nodes fail. Automatic failover behavior and load balancing functionality are part of the grid infrastructure. Transaction security is also guaranteed throughout the entire grid.

  • Event-driven processing: The functionality of computational grids. Computing operations and transactions can take place in parallel across all the nodes in a grid. Simple event processing, similar to the trigger mechanism of databases, ensures that it is possible for the system to react to data changes. Individual pieces of data in the grid can be joined together to form more complex data constructs using the "in-memory views" and "in-memory materialized views" concepts.

Grids have the following features which allow more sophisticated Service Level Agreements (SLA) to be set up:

  • Predictable scalability

  • Continuous availability

  • Provision of a replacement connection in the case of a server failure (failover)

  • Reliability

Grids can be broken down into data grids, in-memory data grids, domain entity grids, and domain object grids on the basis of their primary functionality.

Data grids

A data grid is a system made up of several distributed servers which work together as a unit to access shared information and run shared, distributed operations on the data.

In-memory data grids

In-memory data grids are a variant of data grids in which the shared information is stored locally in memory in a distributed (often transactional) cache. A distributed cache is a collection of data or, more accurately, a collection of objects that is distributed (or partitioned) across any number of cluster nodes, in such a way that exactly one node in the cluster is responsible for each piece of data in the cache, and the responsibility is distributed among the cluster nodes.

Competitive data accesses are handled cluster-wide by the grid infrastructure, if a specific transactional behavior is required. The advantages include the high levels of performance possible as a result of low latency memory access. Today's 64-bit architectures and low memory prices allow larger volumes of data to be stored in memory where they are available for low latency access.

However, if the memory requirements exceed the memory available, "overflow" strategies can be used to store data on a hard disk (for example, in the local filesystem or local databases). This will result in a drop in performance caused by higher latency. The latest developments, such as solid state disks, will in future allow a reasonable and cost-effective compromise in this area, and will be an ideal solution in scenarios of this kind.

Data loss caused by a server failing and, therefore, its area of the memory being lost, can be avoided by the redundant distribution of the information. Depending on the product, different distribution topologies and strategies can be selected or enhanced.

In the simplest case, the information is distributed evenly across all the available servers.

An in-memory data grid helps the application to achieve shorter response times by storing the user data in memory in formats which are directly usable by the application. This ensures that storage accesses with low latency and complex, time-consuming transformations and aggregations when the consumer accesses the data can be avoided. Because the data in the grid is replicated, buffering can be used to accommodate database failures, and the availability of the system is improved. If a cluster node in the data grid fails, the data is still available on at least one other node, which also increases availability. Data will only be lost in the case of a total failure, and this can be counteracted by regular buffering to persistent storage (hard disk, solid state disk, and so on).

Domain entity grids

Domain entity grids distribute the domain data of the system (the applications) across several servers. As these are often coarse granular modules with a hierarchical structure, their data may have to be extracted from several different data sources before being made available on the grid across the entire cluster. The data grid takes on the role of an aggregator/assembler which gives the consumers cluster-wide, high-performance access to the aggregated entities. The performance can be further improved by the grid by initializing the domain data before it is actually used (pre-population).

Domain object grids

A domain object grid distributes the runtime components of the system (the applications) and their status (process data) across a number of servers. This may be necessary for reasons of fail-safety, and also because of the parallel execution of program logic. By adding additional servers, applications can be scaled horizontally. The necessary information (data) for the parallelized functions can be taken from shared data storage, (although this central access can become a bottleneck, which reduces the scalability of the system as a whole) or directly from the same grid or a different grid. It is important to take into account the possibilities of individual products or, for example, to combine several products (data grid and computing grid).

Distribution topologies

Different distribution topologies and strategies are available, such as replicated caches and partitioned caches (Misek, Purdy 2006).

Replicated caches

Data and objects are distributed evenly across all the nodes in the cluster. However, this means that the available memory of the smallest server acts as the limiting factor. This node determines how large the available data volume can be.

Advantages:

  • The maximum access performance is the same across all the nodes, as all the nodes access local memory, which is referred to as zero latency access.

Disadvantages:

  • Data distribution across all the nodes involves high levels of network traffic, and is time consuming. The same applies to data updates, which must be propagated across all the nodes.

  • The available memory of the smallest server determines the capacity limit. This node places a limit on the size of the available data volume.

  • In the case of transactionality, if a node is locked, every node must agree.

  • In the case of a cluster error, all the stored information (data and locks) can be lost.

The disadvantages must be compensated for as far as possible by the grid infrastructure, and circumvented by taking appropriate measures. This should be made transparent to the programmer by using an API which is as simple as possible. The implementation could take the form of local read-only accesses without notification to the other cluster nodes. Operations with supervised competitive access require communication with at least one other node. All the cluster nodes must be notified about update operations. An implementation of this kind results in very high performance and scalability, together with transparent failover and failback.

However, it is important to take into consideration that replicated caches requiring a large number of data updates do not scale linearly in the case of potential cluster growth (adding nodes), which involves additional communication activities for each node.

Partitioned caches

Partitioned caches resolve the disadvantages of replicated caches, relating to memory and communications.

If this distribution strategy is used, several factors must be taken into account:

  • Partitioned: The data is distributed across the cluster in such a way that there are no overlaps of responsibility with regard to data ownership. One node is solely responsible for a specific part of the data, and holds it as a master dataset. Among other things, this brings the benefit that the size of the available memory and computing power increases linearly as the cluster grows. In addition, compared with replicated caches, it has the advantage that all the operations which are carried out on the stored objects require only a single network hop. In other words, in addition to the server that manages the master data, only one other server needs to be involved, and this stores the accompanying backup data in the case of a failover. This type of access to master and backup data is highly scalable, because it makes the best possible use of point-to-point connections in a switched network.

  • Load-balanced: Distribution algorithms ensure that the information in the cache is distributed in the best possible way across the available resources in the cluster, and therefore provide transparent load balancing (for the developer). In many products, the algorithms can be configured or replaced by in-house strategy modules. However, depending on the distribution and optimization strategy, this approach also has disadvantages. The dynamic nature of data distribution may cause data to be redistributed when the optimization strategy is activated, if another member is added to the cluster. In particular, in environments where temporary cluster members are highly volatile, frequent recalculations of the optimum distribution characteristics, and physical data redistribution with its accompanying network traffic, should be avoided. This can be achieved by identifying volatile cluster nodes within the grid infrastructure, and ensuring that they are not integrated into distribution strategies.

  • Location transparency: Although the information about the nodes in the cluster is distributed, the same API is used to access it. In other words, the programmer's access to the information is transparent. He does not need to know where the information is physically located in the cluster. The grid infrastructure is responsible for adapting the data distribution as effectively as possible to access behavior. Heuristics, configurations, and exchangeable strategies are used for this purpose. As long as no specific distribution strategy needs to be created, the way in which the strategy functions in the background is unimportant.

Agents

Agents are autonomous programs that are triggered by an application, and are executed on the information stored in the grid under the control of the grid infrastructure. Depending on the product, specific classes of programming APIs may need to be extended or implemented for this purpose. Alternatively, declarative options allow agent functionality of this kind to be established (for example, using aspect-oriented methods or pre-compilation steps). Predefined agents are often provided with particular products.

Execution patterns

Let's take a brief look at these execution patterns:

  • Targeted execution: Agents can be executed on one specific set of information in the data grid. The information set is identified using a unique key. It is the responsibility of the grid infrastructure to identify the best location in the cluster for the execution, on the basis of the runtime data available (for example, load ratios, node usage, network loads).

  • Parallel execution: Agents can be executed on a specific group of information sets, which are identifiable by means of a number of unique keys. As with the target execution, it's the responsibility of the grid infrastructure to identify the best location in the cluster for the execution, on the basis of the runtime data available (for example, load ratios, node usage, network loads).

  • Query-based execution: This is an extension of the parallel execution pattern. The number of information sets involved is not specified by means of the unique keys, but by formulating one or more filter functions in the form of a query object.

  • Data-grid-wide execution: Agents are executed in parallel on all the available information sets in the grid. This is a specialized form of the query-based execution pattern in which a NULL query object is passed, in other words, a non-exclusive filter condition.

  • Data grid aggregation: In addition to the scalar agents, cluster-wide aggregations can be run on the target data, so that computations can be carried out in (near) real-time. Products often provide predefined functionality for this purpose, including count, average, max, min, and so on.

  • Node-based execution: Agents can be executed on specific nodes in the grid. An individual node can be specified. However, agents can also be run on a defined subset of the available nodes, or on all the nodes in the grid.

Uses

Grid technology can be used in a variety of different ways in architectures:

  • Distributed, transactional data cache (domain entities): Application data can be stored in a distributed cache in a linear scalable form, and with transactional access.

  • Distributed, transactional object cache (domain objects): Application objects (business objects) can be stored in a distributed cache in a linear scalable form and with transaction security.

  • Distributed, transactional process cache (process status): Process objects and their status can be stored in a distributed cache in a linear scalable form, and with transaction security.

  • SOA grid: This is a specialized form of the previous scenario. Business Process Execution Language (BPEL) processes are distributed in serialized form (hydration) throughout the cluster, and can be processed further on another server following de-serialization (dehydration). This results in highly scalable BPEL processes.

  • Data access virtualization: Grids allow virtualized access to distributed information in a cluster. As already mentioned, the location of the data is transparent during the access, regardless of the size of the cluster, which can also change dynamically.

  • Storage access virtualization: Information is stored in a distributed cache in the format appropriate for the application, regardless of the type of source system and its access protocols or access APIs. This is particularly advantageous in cases where the information has to be obtained from distributed, heterogeneous source systems.

  • Data format virtualization: Information is stored in a distributed cache in the format appropriate for the application, regardless of the formats in the source system. This is particularly advantageous in cases where the information has to be obtained from distributed, heterogeneous source systems.

  • Data access buffers: The access to data storage systems (such as RDBMSs) is encapsulated and buffered so that it is transparent for the application. This allows any failover actions by the target system (for example, Oracle RAC) and the necessary reactions of the application to be decoupled. As a result, applications no longer need to be able to react to failover events on different target systems, as this takes place at grid level.

  • Maintenance window virtualization: As already described, data grids support dynamic cluster sizing. Servers can be added to and removed from the cluster at runtime. This makes it possible to migrate distributed applications gradually, without significant downtimes for the application, or even the entire grid. A server can be removed from the cluster, the application can be migrated to this server, and the server can then be returned to the cluster. This process can be repeated with every other server. Applications developed in future on the basis of open standards will reduce this problem.

  • Distributed master data management: In high-load environments, unacceptable bottlenecks may occur in central master data applications. Classic data replication can help to resolve this problem. However, it does involve the use of resources, and is not suitable for (near) real-time environments. Another solution is to distribute the master data across a data grid, provided that there is enough storage.

  • High performance backup and recovery: It is possible to perform long-running backups in several stages in order to improve performance. The data can be written in stages to an in-memory cache, and then at delayed intervals to persistent storage.

  • Notification service in an ESB: Grid technology replaces the message-based system used for notification in a service bus.

  • Complex real-time intelligence: This combines the functionality of CEP and data grids, and therefore enables highly scalable analysis applications which provide complex pattern recognition functions in real-time scenarios, to be made available to the business. In its simplest form, this is an event-driven architecture with CEP engines as consumers, in which the message transport and the pre-analysis and pre-filtering of fine granular individual events is based on grid technology. The infrastructure components of the grid are also responsible for load balancing, fail-safety, and the availability of historic data from data marts in the in-memory cache. The combination of a grid and CEP makes it possible to provide highly scalable, but easily maintained, analysis architectures for (near) real-time business information.

XTP (Extreme Transaction Processing)

As a result of the need for complex processing of large and very large volumes of data (for example, in the field of XML, importing large files with format transformations, and so on.), new distributed storage architectures with parallel application access functions have been developed in recent years.

A range of different cross-platform products and solutions is available, also known as "extreme transaction processing" or XTP. The term was coined by the Gartner Group, and describes a style of architecture which aims to allow for secure, highly scalable and high-performance transactions across distributed environments on commodity hardware and software.

Solutions of this kind are likely to play an increasingly important role in service-oriented and event-driven architectures in the future. Interoperability is a driving force behind XTP.

Distributed caching mechanisms and grid technologies with simple access APIs form the basis for easy, successful implementation (in contrast to the complex products widely used in scientific environments in the past). Although distributed cache products already play a major role in "high-end transaction processing" (an expression coined by Forrester Research), their position in the emerging Information-as-a-Service (IaaS) market is expected to become more prominent.

New strategies for business priority have been introduced by financial service providers in recent years. Banks are attempting to go beyond the limits of their existing hardware resources and develop increasingly high-performance applications, without having to invest in an exponential increase of their hardware and energy costs.

The growth of XTP in areas such as fraud detection, risk computation, and stock trade resolution is pushing existing systems to their performance limits. New systems which should implement this challenging functionality require new architecture paradigms.

It is clear that SOA, coupled with EDA and XTP, represents the future for financial service infrastructures as a means of achieving the goal of running complex computations with very large volumes of data, under real-time conditions. XTP belongs to a special class of applications (extreme transaction processing platforms) that need to process, aggregate, and correlate large volumes of data while providing high performance and high throughput. Typically, these processes produce large numbers of individual events that must be processed in the form of highly volatile data. XTP-style applications ensure that transactions and computations take place in the application's memory, and do not rely on complex remote accesses to backend services, in order to avoid communication latency (low latency computation). This allows for extremely fast response rates while still maintaining the transactional integrity of the data.

The SOA grid (next generation, grid-enabled SOA) is a conceptual variant of the XTPP (Extreme Transaction Processing Platform). It provides state-aware, continuous availability for service infrastructures, application data, and process logic. It is based on an architecture that combines horizontally scalable, database-independent, middle-tier data caching with intelligent parallelization, and brings together process logic and cache data for low latency (data and process affinity). This enables the implementation of newer, simpler, and more efficient models for highly scalable, service-oriented applications that can take full advantage of the possibilities of event-driven architectures.

XTP and CEP

XTP and CEP are comparable, in that they both consume and correlate large amounts of event data to produce meaningful results.

Often, however, the amount of event data that needs to be captured and processed far exceeds the capacity of conventional storage mechanisms ("there just isn't a disk that can spin fast enough"). In these cases, the data can be stored in a grid. CEP engines can be distributed across this data and can access it in parallel. Analyses can be carried out, and business event patterns can be identified and analyzed in real-time. These patterns can then be processed further and evaluated using Business Activity Monitoring (BAM).

Solid State Disks and grids

Solid State Disk (SSD) technology is developing at high speed. Data capacities are increasing rapidly and compared with conventional drives, the I/O rates are phenomenal. Until now, the price/performance ratio per gigabyte of storage has been the major obstacle to widespread use. It is currently a factor of 12 of the cost of a normal server disk, per gigabyte of storage. The major benefit for data centers is the very low energy consumption, which is significantly less than that of conventional disks.

Because of their low energy requirements, high performance, low latency, and the expectation of falling costs, SSDs are an attractive solution in blades or dense racks. One interesting question concerns the influence which SSDs may have on data grid technology.

Disk-based XTP systems can benefit from the introduction of an SSD drive However, SSDs currently have a much lower storage capacity (128 GB versus 1 TB) than conventional disks. Nevertheless, this is more than the capacity of standard main memory, and SSDs are also less costly per gigabyte than memory. The capacity of SSDs is lower than that of conventional disks by a factor of 10, and higher than the capacity of memory by a factor of 8.

SSDs bridge the gap between memory-based and disk-based XTP architectures. SSD-based architectures are slightly slower than memory-based systems, but significantly faster than the fastest disk-based systems. The obvious solution is, therefore, to provide a hierarchical storage architecture in XTP systems, where the most volatile data is stored in memory, data accessed less often is stored on SSDs, and conventional disk-based storage is used for long-term persistent data. It also seems reasonable to store memory overflows from memory-based caching on SSDs.

 

Summary


At this point in time, you should have a basic understanding of the fundamental concepts of integration, and the terminology used with it. You should now understand:

  • The basic concepts used in the context of integration architecture

  • The different architecture variants, such as point-to-point, hub-and-spoke, pipeline, and SOA

  • What service-oriented integration is and why it is important

  • The different types of data integration and the accompanying patterns

  • The difference between Enterprise Application Integration (EAI) and Enterprise Information Integration (EII)

  • The concept of Event Drive Architecture (EDA) and the different types of Event Processing and why they play an important role in integration

  • The integration technologies of the future: grid computing and extreme transaction processing (XTP)

In the next chapter, you will learn about the base technologies related to the implementation of solutions based on the Trivadis Integration Architecture Blueprint.

About the Authors

  • Guido Schmutz

    Guido Schmutz works for Trivadis, an Oracle Platinum Partner. He has more than 25 years of technology experience, including mainframes, integration, and SOA technologies in financial services, government, and logistics environments. At Trivadis, he is responsible for innovation in the areas of SOA, BPM, and application integration solutions and leads the Trivadis Architecture Board. He has longtime experience as a developer, coach, trainer, and architect in the areas of building complex Java EE and SOA-based solutions. Currently, he is focusing on the design and implementation of SOA and BPM projects using the Oracle SOA stack. A few other areas of interest for Guido are big data and fast data solutions and how to combine these emerging technologies into a modern information and software architecture. Guido is an Oracle ACE Director for Fusion Middleware and SOA and a regular speaker at international conferences, such as Oracle Open World, ODTUG, SOA & Cloud Symposium, UKOUG conference, and DOAG. He is also a coauthor of Oracle Service Bus 11g Development Cookbook, Do More with SOA Integration: Best of Packt, Service-Oriented Architecture: An Integration Blueprint, Spring 2.0 im Einsatz, Architecture Blueprints, and Integration Architecture Blueprint.

    Browse publications by this author
  • Peter Welkenbach

    Peter Welkenbach works as a consultant, senior architect, and trainer in the fields of requirement engineering, object-oriented methodologies, software engineering, and quality management. He has more than 20 years experience of designing and implementing complex information systems for banks, automotive manufacturers, and pharmaceutical companies. For 10 years he has been a technology evangelist for Java technology and the use of the corresponding frameworks in customer projects. Peter Welkenbach is a course developer, author of numerous publications, and speaker at JAX and international Oracle conferences. He has been using Spring in numerous customer projects since it first appeared in summer 2003. His current focus is on enterprise architecture and lean architecture methodologies. In his current projects he works as an enterprise architect for a well known German retailer.

    Browse publications by this author
  • Daniel Liebhart

    Daniel Liebhart has over 20 years of experience in the information technology field, which has culminated in a broad technical and business know-how. For 10 years he has been working in different management positions, leading IT professional services or product development. His broad know-how comprises the engineering, realization, and operation of complex and internationally operated IT systems for the Telecommunication, Finance, Logistic, and Chemical industries, as well as for public services. He has authored three books for Hanser Publications, is a passionate computer science engineer, possesses several awards, and has worked for Trivadis, a leading independent IT service company operating in Germany, Austria, and Switzerland. He works as an assistant professor at the University of Applied Science in Zurich.

    Browse publications by this author

Latest Reviews

(2 reviews total)
Excellent collection, easy checkout.
The book seems accurate, and it may have an audience, but it didn't meet my requirements at all. If you're looking for a book that begins with a managerial level summary of the technology and a guide to choosing from today's mostly vendor-based SOA solutions, this book isn't going to work for you. In my opinion, the Packt summary led me to believe this would be the book for me. If you are a java developer preparing to enter the SOA field in, say, six months, this book won't be a bad guide for you. It is very weak on vendor based solutions, but it is organized and comprehensive talking about the Java technologies connected to SOA development (current at the time of its publishing).
Book Title
Access this book and the full library for FREE
Access now