Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Simplify Big Data Analytics with Amazon EMR

You're reading from  Simplify Big Data Analytics with Amazon EMR

Product type Book
Published in Mar 2022
Publisher Packt
ISBN-13 9781801071079
Pages 430 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Sakti Mishra Sakti Mishra
Profile icon Sakti Mishra

Table of Contents (19) Chapters

Preface Section 1: Overview, Architecture, Big Data Applications, and Common Use Cases of Amazon EMR
Chapter 1: An Overview of Amazon EMR Chapter 2: Exploring the Architecture and Deployment Options Chapter 3: Common Use Cases and Architecture Patterns Chapter 4: Big Data Applications and Notebooks Available in Amazon EMR Section 2: Configuration, Scaling, Data Security, and Governance
Chapter 5: Setting Up and Configuring EMR Clusters Chapter 6: Monitoring, Scaling, and High Availability Chapter 7: Understanding Security in Amazon EMR Chapter 8: Understanding Data Governance in Amazon EMR Section 3: Implementing Common Use Cases and Best Practices
Chapter 9: Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark Chapter 10: Implementing Real-Time Streaming with Amazon EMR and Spark Streaming Chapter 11: Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi Chapter 12: Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA Chapter 13: Migrating On-Premises Hadoop Workloads to Amazon EMR Chapter 14: Best Practices and Cost-Optimization Techniques Other Books You May Enjoy

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "For example, the following sample JSON specifies configurations for the core-site and mapred-site classifications and includes Hadoop and MapReduce properties with values that you plan to override in the cluster."

A block of code is set as follows:

    "Properties": {
      "mapred.tasktracker.map.tasks.maximum": "10",
      "mapreduce.map.sort.spill.percent": "0.80",
      "mapreduce.tasktracker.reduce.tasks.maximum": "20"
    }

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    "Classification": "core-site",
    "Properties": {
      "hadoop.security.groups.cache.secs": "500"

Any command-line input or output is written as follows:

aws emr create-cluster --instance-type m5.2xlarge --release-label emr-6.4.0 --security-configuration <mySecurityConfigName>

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "If you are creating a transient cluster that needs to execute a few steps and then auto terminate, then you can select Step execution for Launch mode."

Tips or Important Notes

Appear like this.

lock icon The rest of the chapter is locked
Next Chapter arrow right
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}