Reader small image

You're reading from  Hands-On Azure for Developers

Product typeBook
Published inNov 2018
PublisherPackt
ISBN-139781789340624
Edition1st Edition
Tools
Right arrow
Author (1)
Kamil Mrzygłód
Kamil Mrzygłód
author image
Kamil Mrzygłód

Kamil Mrzygłód is a technical lead and technology advisor, working with multiple companies on designing and implementing Azure-based systems and platforms. He's a former Microsoft Azure Microsoft Most Valuable Professional (MVP) and certified trainer, who shares his knowledge via various channels, including conference speeches and open source projects and contributions. Kamil lives in Poland with his two cats and one dog, dedicating some of his time to video games, cooking, and traveling.
Read more about Kamil Mrzygłód

Right arrow

Big Data Storage - Azure Data Lake

Sometimes, we have to store unlimited amounts of data. That scenario covers most big data platforms, where having even a soft limit for the maximum capacity could cause problems with the active development and maintenance of our application. Thanks to Azure Data Lake, we have limitless possibilities when it comes to storing both structured and unstructured data, all with an efficient security model and great performance.

The following topics will be covered in this chapter:

  • Azure Data Lake Store fundamentals
  • Storing data in Azure Data Lake Store
  • Security features and concerns
  • Best practices for working with Azure Data Lake Store

Technical requirements

To perform the exercises in this chapter, you will need:

  • Access to an Azure subscription

Understanding Azure Data Lake Store

When considering your storage solution, you have to take into account the amount of data you want to store. Depending on your answer, you may choose a different option from services available in Azure—Azure Storage, Azure SQL, or Azure Cosmos DB. There is also a variety of databases available as images for VMs (such as Cassandra or MongoDB); the ecosystem is quite rich so everyone can find what they are looking for. The problem arises when you do not have an upper limit for the amount of data stored or, considering the characteristics of today's applications, that amount grows so rapidly that there is no possibility to declare a safe limit, which we will never hit. For those kinds of scenario, there is a separate kind of storage named Data Lakes. They allow you to store data in its natural format, so it does not imply any kind of...

Storing data in Azure Data Lake Store

Because Azure Data Lake Store is all about storing data, in this section of the chapter you will see how you can store different files, use permissions to restrict access to them, and organize your instance. The important thing to remember here is the fact that you are not limited to using big data tools to store or access data stored within a service—if you manage to communicate with the Azure Data Lake Store protocol, you can easily operate on files using C#, JavaScript, or any other kind of programming language.

Using the Azure portal to navigate

To get started with working with files in the Azure portal, you will have to click on the Data explorer button:

Once you click on...

Security

Azure Data Lake Store offers a bit of a different security model than other storage options available for Azure. In fact, it offers you a complex solution that consists of authentication, authorization, network isolation, data protection, and auditing. As it is designed to be the very base of data-driven systems, it has to extend common capabilities when it comes to securing who (or what) and how to access information stored. In this section, we will cover different security features available and describe them in detail, so you are familiar with them and know how to use them.

Authentication and authorization

To authenticate who or what can access data stored, Azure Data Lake Store uses Azure Active Directory to know...

Best practices

Azure Data Lake Store is a bit different when it comes to accessing data stored and performing read and writes. As this service is designed for storing petabytes of data, it is important to know the best practices for doing so, to avoid problems such as the need to reorganize all files or slow reads/writes. This also includes security features (as discussed earlier), as this is an important part of the whole solution. In this section, we will focus on multiple advice regarding ADLS, so you will use it consciously and leverage the best practices.

Performance

One important feature of many storage solutions is their performance. In general, we expect that our databases will work without a problem whether the load...

Summary

In this chapter, you have learned a bit about Azure Data Lake Store, an Azure service designed to store an almost unlimited amount of data without affecting its structure. We have covered things such as data structure, security features, and best practices, so you should be able to get started on your own and build your very first solution based on this particular Azure component. Bear in mind that what can easily replace Azure Storage for example—it all depends on your requirements and expectations. If you're looking for a more flexible security model, better performance, and better limits, ADLS is for you. This ends this part of the book, which included services for storing data, monitoring services, and performing communication between them. In the next chapter, you will learn more about scaling, performance, and maintainability in Azure.

...

Questions

  1. Which security model is better—managing security groups or individual entities, and why?
  2. What is the difference between RBAC and POSIX ACL?
  3. What is the maximum size of a file in ADLS?
  1. Which data structure is better—a single folder containing thousands of files or a hierarchy of folders containing several files each?
  2. Can Azure Data Lake Store be used with any programming language?
  3. What is the difference between ADLS and Azure Storage?
  4. How do you ensure that your solution based on ADLS is geo-redundant?
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Azure for Developers
Published in: Nov 2018Publisher: PacktISBN-13: 9781789340624
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Kamil Mrzygłód

Kamil Mrzygłód is a technical lead and technology advisor, working with multiple companies on designing and implementing Azure-based systems and platforms. He's a former Microsoft Azure Microsoft Most Valuable Professional (MVP) and certified trainer, who shares his knowledge via various channels, including conference speeches and open source projects and contributions. Kamil lives in Poland with his two cats and one dog, dedicating some of his time to video games, cooking, and traveling.
Read more about Kamil Mrzygłód