Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
SAP on Azure Implementation Guide

You're reading from   SAP on Azure Implementation Guide Move your business data to the cloud

Arrow left icon
Product type Paperback
Published in Feb 2020
Publisher Packt
ISBN-13 9781838983987
Length 242 pages
Edition 1st Edition
Tools
Arrow right icon
Authors (2):
Arrow left icon
Nick Morgan Nick Morgan
Author Profile Icon Nick Morgan
Nick Morgan
Bartosz Jarkowski Bartosz Jarkowski
Author Profile Icon Bartosz Jarkowski
Bartosz Jarkowski
Arrow right icon
View More author details
Toc

SAP Data Hub

With the massive grow in the amount of data that is now being processed and stored, data management is a common and difficult issue. Multiple systems in your IT landscape work in isolation and there are no data governance processes established. If the source information is changed, there is no single place that can answer two basic questions – who changed it and why? In large organizations, the structure of the data is an additional challenge. It is quite likely that you have multiple data marts, or even multiple data lakes, and you are probably also using multiple different analytics and visualization software. Each solution is managed by a different team that uses different toolsets, and getting a holistic view of your data is a big challenge, if not impossible.

The SAP Data Hub, which is one of the newer SAP products, tries to address these issues and provides a common platform for data governance and data orchestration. It implements a data catalogue to keep track of available information sources and allows you to integrate and enrich them using pipeline processes.

Customers that use the Azure data platform capabilities can use SAP Data Hub as a data extraction tool to offload information stored in SAP ERP or SAP S/4HANA to Azure Data Lake Store (ADLS). It opens a world of possibilities – the extracted data can be a source for Machine Learning algorithms or can be blended with information coming from external applications and then analyzed using Power BI.

Modern tools require a modern architecture model that by default implements resilience and simplifies the management of the solution. This is why SAP Data Hub uses Docker and Kubernetes engines as a framework. The core system processes are by default distributed to multiple nodes and this allows them to dynamically scale out. It is possible to run SAP Data Hub on-premises, but the solution fits well in Azure, where it can utilize the native Azure Kubernetes Service (AKS). You only pay for the compute resources used and for the allocated storage. SAP officially supports running Data Hub workloads on AKS. There are five main resources that are required to run Data Hub in Azure:

  • Azure Kubernetes Service: Orchestrates a cluster of virtual machines and schedules containers to run
  • Azure Container Registry: Stores the container images
  • Load Balancers: Exposes the services to the external network
  • Virtual Machines: Runs the containers
  • Azure Storage: Stores the data volumes and disks

Each of these resources is required to successfully run SAP Data Hub on Azure and they should be taken into account during landscape planning.

Supported platforms

All SAP Data Hub versions starting from release 2.3 work and are supported in Microsoft Azure. Each version is tested against a certain framework and it is recommended that you follow the requirements detailed in the relevant installation guide.

The current release, (September 2019) SAP Data Hub 2.6, is supported on the Kubernetes 1.11.*–1.13.* releases. AKS allows you to upgrade the deployed Kubernetes engine. If a new SAP Data Hub release requires a higher version of Kubernetes, it is possible to change it without the need to reinstall the entire solution.

System sizing

SAP Data Hub runs in a Kubernetes cluster in Azure. The minimal configuration for production workload suggested by SAP is as follows:

  • Four worker nodes, each with:
    • Minimum 64 GB of memory
    • Minimum 8 CPUs
    • Minimum 100 GB of storage

In Azure, you can use the E8s_v3 VM, which fulfils the given requirements.

For non-production environments, the hardware requirements are relaxed:

  • Three worker nodes, each with:
    • Minimum 32 GB of memory
    • Minimum 4 CPUs
    • Minimum 100 GB of storage

In Azure you can use the E4s_v3 VM, which fulfils above requirements.

The recommended sizing indicates the minimal requirements to run SAP Data Hub. When running a production workload, the number of concurrently running pipelines and the size of the SAP Vora store are the main drivers to precisely estimate the required hardware.

System deployment

Before running the system installation, you need to correctly configure the Kubernetes environment in Azure. It involves a basic choice such as the VM size, but you also need to consider the networking aspects. As the SAP Data Hub operates in conjunction with other systems in your landscape, you need to ensure there is network connectivity in place.

Azure Kubernetes Service uses a Kubernetes, together with virtual networks already available in the Azure platform. The pods running in the cluster are isolated in the logical network, which should not use address space that is currently deployed in your on-premises or Azure environment. As each pod requires an IP address to communicate with other pods, and there may be tens of pods in a single cluster, using a range that is currently being used would quickly exhaust the available addresses.

In AKS, only cluster nodes receive an IP from the virtual network. Network address translation (NAT) is used to enable communication between pods and external systems. Such deployment offers two benefits: it doesn't reduce the number of available IP addresses in the network and it doesn't block connection between SAP Data Hub and other solutions. This is illustrated in Figure 2-43:

Diagram of the Azure Kubernetes Service in action showing an Azure virtual network routing through a node subnet to two nodes, each with a Kubenet and using network address translation to route data to a collection of pods.

Figure 2-43: Azure Kubernetes service - network view (Source: Microsoft.com)

The next area to consider is how the system will be accessed. It is possible to expose SAP Data Hub Launchpad to the internet, but as the SAP Data Hub connects to other systems in your landscape, in most cases, you will want to secure it inside a virtual network. An internal or public load balancer routes traffic to the correct service within the cluster as shown in Figure 2-44:

Internal Load Balancer routes internal traffic.Public Load Balancer routes traffic outside VNet.

Figure 2-44: Accessing SAP Data Hub

Application deployment in Kubernetes is different to what you will be used to when working with other solutions. This is because it uses Desired State Configuration management. Effectively, you do not install any application; rather, you inform the cluster as to what object you want to run and what its require configuration is. An object is a persistent entity in the cluster – for example, a pod, a service, or a volume. The Kubernetes object is a record of intent – Kubernetes systems will constantly work to ensure that your object exists39.

System resilience is a built-in capability of the Kubernetes framework. Selected stateless objects are deployed on multiple nodes and the traffic is routed to the active service. In case of a node failure, Kubernetes restarts the affected object by staring another one.

Azure Monitor integrates well with the Azure Kubernetes Engine and provides insights about container performance. You can identify the processor and memory utilization, and similarly to monitoring other resources, you can create a rule to notify someone if the metrics are above a set threshold. SAP Data Hub executes each pipeline as a separate pod and Azure Monitor helps to understand the required capacity and determine the maximum load:

Status of virtual machines in Azure Monitor for Containers.

Figure 2-45: Azure Monitor for containers (source: Microsoft.com)

The preceding screenshot shows how Azure Monitor helps to manage the Kubernetes cluster. Information like the status of the node or the number of deployed containers is visible from the Azure portal and it provides a centralized view to a Kubernetes workload.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
SAP on Azure Implementation Guide
You have been reading a chapter from
SAP on Azure Implementation Guide
Published in: Feb 2020
Publisher: Packt
ISBN-13: 9781838983987
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime
Modal Close icon
Modal Close icon