Reader small image

You're reading from  The Self-Taught Cloud Computing Engineer

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781805123705
Edition1st Edition
Right arrow
Author (1)
Dr. Logan Song
Dr. Logan Song
author image
Dr. Logan Song

Dr. Logan Song is the enterprise cloud director and chief cloud architect at Dito. With 25+ years of professional experience, Dr. Song is highly skilled in enterprise information technologies, specializing in cloud computing and machine learning. He is a Google Cloud-certified professional solution architect and machine learning engineer, an AWS-certified professional solution architect and machine learning specialist, and a Microsoft-certified Azure solution architect expert. Dr. Song holds a Ph.D. in industrial engineering, an MS in computer science, and an ME in management engineering. Currently, he is also an adjunct professor at the University of Texas at Dallas, teaching cloud computing and machine learning courses.
Read more about Dr. Logan Song

Right arrow

Azure Cloud Database and Big Data Services

In the first part of the book, we discussed the AWS database and big data services. In the second part of the book, we covered the Google database and big data services. Coming to the third part of the book, after discussing Microsoft Azure’s foundational cloud services in the last chapter, we will now focus on the Azure database and big data services, which are like AWS and Google data services but with their own features.

Like Amazon and Google, Microsoft provides many solid data storage and analytics services in its Azure cloud platform. In this chapter, we will cover the following topics:

  • Azure Cloud Data Storage explores some basic concepts about Azure storage accounts and Azure Data Lake Storage
  • Azure Database Services examines Azure database services such as Azure SQL Database, Azure NoSQL database solutions including Azure Table Storage and Cosmos DB, Azure data warehouses and Azure Synapse Analytics
  • Azure...

Azure cloud data storage

During the launch of Azure Cloud Shell in the previous chapter, you may have noticed that we created an Azure storage account before the Azure Cloud Shell launch. An Azure storage account provides unique storage space for your Azure cloud data, accessible from anywhere in the world over HTTP or HTTPS. When an Azure storage account is created, the following Azure storage data objects are created: blobs, files, queues, and tables, in an all-in-one fashion:

  • Azure blobs are blob storage, which is an object storage like AWS S3 or Google GCS
  • Azure files permit you to manage file-sharing in the cloud – shareable to Azure VMs and on-prem VMs
  • Azure queue storage is a cloud service similar to AWS Simple Queue Service (SQS),for storing large numbers of messages
  • Azure table storage stores structured NoSQL cloud data, with a key/attribute store and a schema-less design

An Azure storage account provides all these storage and data services...

Azure cloud databases

While an Azure data lake stores raw data, an Azure database usually stores formatted data. Azure offers cloud database services categorized into relational databases and NoSQL databases.

Azure cloud relational databases

Like AWS and GCP, Azure offers three options for cloud relational database deployment and usage:

  • Azure SQL virtual machines: SQL Server built on Azure virtual machines. This is an Infrastructure-as-a-Service (IaaS) cloud service and thus you will have control of the database edition, version, and size. You will also be fully responsible for managing the virtual machine, including patching and other configuration management. More information is available at https://azure.microsoft.com/en-us/products/virtual-machines/sql-server.
  • Azure SQL managed instances: This is the best for migrating on-premises SQL databases to the cloud. It serves the purpose of migrating many apps from on-premises to a fully managed Platform-as-a-Service...

Azure cloud big data services

Like Amazon and Google, Microsoft provides a full stack of big data cloud services, including the big data ETL service, ADF; the big data processing service tool, Azure HDInsight; and the big data analytic service, Azure Data Bricks.

Azure ADF

ADF is a cloud-based data integration service for creating data-driven workflows that automatically move and transform data. ADF is a pipeline – a logical grouping of activities to perform a data-driven task, such as the following:

  • Data moving – This takes an ingestion source, pulls it into the Azure cloud, and puts it into a data lake
  • Data transformation – This connects the data lake to Databricks, runs a stored procedure, and transforms data to produce a new dataset for further analytics

Essentially, ADF is a data ETL service integrating hybrid data at an enterprise level. More details about ADF are available at https://azure.microsoft.com/en-us/products/data-factory...

Summary

In this chapter, we learned about the Azure cloud database and big data services. We explored Azure Data Lake Storage, Azure cloud databases and Azure Synapse Analytics, Azure data ETL tools such as ADF, data processing tools such as HDInsight, and Azure data analytics tools such as Databricks. By the end of this chapter, you will have acquired knowledge on data ingestion, storing, processing, and visualization in the Azure cloud.

In the next chapter, we will examine the machine learning services in Azure’s cloud.

Practice questions

Questions 1-3 are based on the following.

The data team for company ABC is building an Azure cloud data analytics platform, with the following objectives:

  • The team has two data scientists who are familiar with R, Scala, and Python, and two data engineers who are good at Python. Each team member needs a cluster.
  • The team needs to run notebooks that use Python, Scala, and SQL for their job workloads.
  • The team needs to optimize their work performance.

1. What platform fits a data scientist?

A. A High Concurrency Databricks cluster

B. A standard Databricks cluster

C. An AFD pipeline

D. The Azure Synapse platform

2. What platform fits a data engineer?

A. A High Concurrency Databricks cluster

B. A standard Databricks cluster

C. An AFD pipeline

D. The Azure Synapse platform

3. What platform fits the job workload?

A. A High Concurrency Databricks cluster

B. A standard Databricks cluster

C. An AFD pipeline

...

Answers to the practice questions

1. B

2. A

3. B

4. A

5. A

6. A

7. C

8. C

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Self-Taught Cloud Computing Engineer
Published in: Sep 2023Publisher: PacktISBN-13: 9781805123705
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Dr. Logan Song

Dr. Logan Song is the enterprise cloud director and chief cloud architect at Dito. With 25+ years of professional experience, Dr. Song is highly skilled in enterprise information technologies, specializing in cloud computing and machine learning. He is a Google Cloud-certified professional solution architect and machine learning engineer, an AWS-certified professional solution architect and machine learning specialist, and a Microsoft-certified Azure solution architect expert. Dr. Song holds a Ph.D. in industrial engineering, an MS in computer science, and an ME in management engineering. Currently, he is also an adjunct professor at the University of Texas at Dallas, teaching cloud computing and machine learning courses.
Read more about Dr. Logan Song