You're reading from Multi-Cloud Strategy for Cloud Architects - Second Edition

Product typeBook

Published inApr 2023

PublisherPackt

ISBN-139781804616734

Edition2nd Edition

Tools

Azure AWS

Concepts

Cloud Computing

Author (1)

Jeroen Mulder

Choosing the right platform for data

It is a cliché, but nonetheless it’s also very true: data is the new gold. It’s for good reasons that in enterprise architecture frameworks data is named as the first thing that a business must do is to analyse what data it should use and how to gain optimal benefits from data. No business can operate without data: it needs the data to gain insights into markets and the demands of their customers. It needs data to drive the business.

You will find the term data-driven in almost every cloud assessment study. What does data-driven mean? A company makes decisions based on the analysis of data. Intuition or decision based on previous experience are ruled out. Every action is supported by the analysis of data.

To enable a data-driven business, we need one thing: the data itself, and typically in vast amounts and preferably (near) real-time. The collection of data is prerequisite number one. Prerequisite number two is that this data must...

Building and sizing a data platform

As with every service that we deploy in cloud, we need something to build a platform on a foundation. Hence, building a landing zone that can hold raw data is the first step. This landing zone should be an environment that serves only one purpose: to capture raw data. It’s recommended to build this landing zone separate from core IT systems. It should be scalable, but at low-cost, since it will hold a lot of data. The issue with keeping data is that it might increase the cloud bill exponentially. Data storage comes at a very low price per unit of data, but the catch is that we need a lot of these small units.

Important is to implement governance from the start. This includes defining and implementing guardrails for classification of data and tagging.

Once the landing zone has been established, data analysts can start using the data lake as a sandbox environment. This is the second stage. Analysts can start building prototypes of data models and...

Designing for interoperability and portability

Portability and interoperability should be driven by use and business cases – not for the pure sake of portability or interoperability. In IT-systems there are four levels that define portability of systems: data, applications, platforms, and infrastructure, following the Architecture Development Method (ADM) of TOGAF.

Data represents information in such form that it can be processed by computers. Data is stored in storage that is accessible to computers.

Applications is software that performs actions that are triggered by business requests.

Platforms support the applications.

Infrastructure is a collection of computation, storage, and network resources. Computation can also refer to cloud computing including virtual machines, containers and serverless functions.

One important note that we have to make at this point, is that cloud computing does cause the effect of ‘blurring’ in the demarcation of infrastructure, platforms...

Overcoming challenges of data gravity

Applications don’t just hold data, but they also produce a lot of data that they share with other applications. Data will attract new data and services in other applications. As data accumulates, more and more applications and services will use it. Data and applications are attracted to each other, as in the law of gravity. To put it short and simply: the amounts of data will grow, either autonomously, but likely because data sources will be connected to other data sources.

In addition to a strategic advantage of having access to this data, this also presents a major challenge. Databases are becoming so large that it becomes almost impossible to move the data. This can lead to the situation that companies are tied to a certain location to hold that data. In addition, companies that use each other's data and services must stay close to each other in order to provide good service. By keeping data physically close together, it can be exchanged...

Managing the foundation for data lakes

Data engineers design, build and manage the data pipelines, but the foundation of the data lake and data warehouse is the specific landing zone for the data platform. Typically, landing zones in cloud are operated by cloud engineers who take care of the compute, storage, and network resources.

Looking at management of data platforms, we can distinguish various roles:

Data architect or engineer: the architect and data engineer are often combined in one role. The role is responsible for design, development, and deployment of the data pipelines. The engineer must have extensive knowledge of ETL or ELT principles and technologies, making sure that data from sources get collected and transformed into usable datasets in data warehouses or other data products where the data can be further analyzed. Data also needs to be validated, which is a required skill of the engineer too. In essence, the engineer makes sure that data that is ingested into warehouses...

Summary

In this chapter, we discussed the basic architecture principles to build and manage a data platform. We looked at data lakes that can hold vast amounts of raw data and how we can build these lakes on top of cloud storage. The next step is to fetch the right data that is usable in data models. We must extract, transfer and load – ETL or ELT for short - the accurate data sets in environments where data analysts can work with this data. Typically, data warehouses are used for this.

We studied the various propositions for data operations of the major cloud providers AWS, Azure, Google Cloud, Alibaba, and Oracle. Next, we discussed the challenges that come with building and operating data platforms. There will be challenges with respect to access to data, accuracy, but also privacy and compliancy. Data gravity is another problem that we must solve. It’s not easy to move huge amounts of data across platform, hence we must find other solutions to work with data in different...

Questions

What does the term ETL mean?
What would be the first step in building a data platform?
True or false: data lakes are typically built on the common storage layers of major cloud providers such as Azure blob storage and Amazon S3.
What does Oracle’s GoldenGate do?

Jeroen Mulder is a certified enterprise and security architect, and he works with Fujitsu (Netherlands) as a Principal Business Consultant. Earlier, he was a Sr. Lead Architect, focusing on cloud and cloud native technology, at Fujitsu, and was later promoted to become the Head of Applications and Multi-Cloud Services. Jeroen is interested in the cloud technology, architecture for cloud infrastructure, serverless and container technology, application development, and digital transformation using various DevOps methodologies and tools. He has previously authored “Multi-Cloud Architecture and Governance”, “Enterprise DevOps for Architects”, and “Transforming Healthcare with DevOps4Care”.
Read more about Jeroen Mulder

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2