Reader small image

You're reading from  Simplify Big Data Analytics with Amazon EMR

Product typeBook
Published inMar 2022
PublisherPackt
ISBN-139781801071079
Edition1st Edition
Tools
Concepts
Right arrow
Author (1)
Sakti Mishra
Sakti Mishra
author image
Sakti Mishra

Sakti Mishra is an engineer, architect, author, and technology leader with over 16 years of experience in the IT industry. He is currently working as a senior data lab architect at Amazon Web Services (AWS). He is passionate about technologies and has expertise in big data, analytics, machine learning, artificial intelligence, graph networks, web/mobile applications, and cloud technologies such as AWS and Google Cloud Platform. Sakti has a bachelor’s degree in engineering and a master’s degree in business administration. He holds several certifications in Hadoop, Spark, AWS, and Google Cloud. He is also an author of multiple technology blogs, workshops, white papers and is a public speaker who represents AWS in various domains and events.
Read more about Sakti Mishra

Right arrow

Using S3 versus HDFS for cluster storage

As you may have understood by now, EMR has the flexibility to choose HDFS or EMRFS + S3 as the cluster's persistent storage. As explained previously, EMR has different types of nodes: the master node, core nodes, and task nodes.

Now, let's understand how both of these storage layers are different and which problem statements they solve.

HDFS as cluster-persistent storage

As you can see from the following diagram, there are multiple core nodes pointing to the master node, and each core node has its own CPU, memory, and HDFS storage:

Figure 2.4 – EMR node structure with HDFS as persistent storage

These are some properties to be aware of when your cluster uses HDFS as persistent storage:

  • You need to maintain by default three copies of data across the core nodes to be fault-tolerant.
  • An EMR cluster is deployed in a single Availability Zone (AZ) of a Region, so a complete AZ failure...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Simplify Big Data Analytics with Amazon EMR
Published in: Mar 2022Publisher: PacktISBN-13: 9781801071079

Author (1)

author image
Sakti Mishra

Sakti Mishra is an engineer, architect, author, and technology leader with over 16 years of experience in the IT industry. He is currently working as a senior data lab architect at Amazon Web Services (AWS). He is passionate about technologies and has expertise in big data, analytics, machine learning, artificial intelligence, graph networks, web/mobile applications, and cloud technologies such as AWS and Google Cloud Platform. Sakti has a bachelor’s degree in engineering and a master’s degree in business administration. He holds several certifications in Hadoop, Spark, AWS, and Google Cloud. He is also an author of multiple technology blogs, workshops, white papers and is a public speaker who represents AWS in various domains and events.
Read more about Sakti Mishra