Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Simplify Big Data Analytics with Amazon EMR

You're reading from  Simplify Big Data Analytics with Amazon EMR

Product type Book
Published in Mar 2022
Publisher Packt
ISBN-13 9781801071079
Pages 430 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Sakti Mishra Sakti Mishra
Profile icon Sakti Mishra

Table of Contents (19) Chapters

Preface 1. Section 1: Overview, Architecture, Big Data Applications, and Common Use Cases of Amazon EMR
2. Chapter 1: An Overview of Amazon EMR 3. Chapter 2: Exploring the Architecture and Deployment Options 4. Chapter 3: Common Use Cases and Architecture Patterns 5. Chapter 4: Big Data Applications and Notebooks Available in Amazon EMR 6. Section 2: Configuration, Scaling, Data Security, and Governance
7. Chapter 5: Setting Up and Configuring EMR Clusters 8. Chapter 6: Monitoring, Scaling, and High Availability 9. Chapter 7: Understanding Security in Amazon EMR 10. Chapter 8: Understanding Data Governance in Amazon EMR 11. Section 3: Implementing Common Use Cases and Best Practices
12. Chapter 9: Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark 13. Chapter 10: Implementing Real-Time Streaming with Amazon EMR and Spark Streaming 14. Chapter 11: Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi 15. Chapter 12: Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA 16. Chapter 13: Migrating On-Premises Hadoop Workloads to Amazon EMR 17. Chapter 14: Best Practices and Cost-Optimization Techniques 18. Other Books You May Enjoy

Chapter 8: Understanding Data Governance in Amazon EMR

In previous chapters, you learned about EMR cluster security with IAM policies and data encryption and how you can configure security groups to control network traffic from or to your cluster.

As well as EMR cluster-level security, you can also enable data-level security where you can build a centralized data catalog on your datasets and then define fine-grained permissions to control which user can access which database, table, or column of your data catalog. Security of data is as important as maintaining security on your infrastructure. When you put security controls on your data, you also need to think about whether the data available for consumption is available in a useful format with proper data quality checks in place.

That brings us to the focus of this chapter, where we will dive deep into the following topics, which will help you implement data governance and granular permission management on your data catalog...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}