You're reading from AWS Certified Solutions Architect ??? Associate Guide

Product typeBook

Published inOct 2018

PublisherPackt

ISBN-139781789130669

Edition1st Edition

Tools

AWS

Concepts

IT Certification

Authors (2):

Gabriel Ramirez

Stuart Scott

View More author details

Introducing Amazon Elastic MapReduce

The volume of data created by mankind is increasing massively. In the last two years, we have created more data than in the previous history of the human race—unstructured data grows every second. This is why new paradigms must be used to properly manage it.

The term big data is used more and more frequently, but what exactly is big data? How big is big data? It all depends on the perspective. Imagine a small company that works with spreadsheets accumulating data every year to the point where this tool is no longer useful. The company needs a new strategy such as relational databases and ERP software.

This same analogy works for big companies. Big data is using non-traditional methods to analyze vast amounts of information from different sources and types. Latency plays an important role in the big-data pipeline, because, depending on...

Technical requirements

You will need access to the CLI, Python 2.6.5 or higher, an IAM user with sufficient permissions to create roles, EC2 instances, and related resources. An AdministratorAccess policy can be used.

Clustering in AWS

Clustering is a way to group the compute resources physically. The nearest the better improving the communications performance and lowering jitter. Clusters can be tightly or loosely coupled and have a master node that performs all the orchestration activities of the compute nodes. Every cluster in AWS is a single Availability Zones (AZ) concept. To gain resilience, it can use specialized persistence services such as EFS, EBS, and Amazon S3.

There are two main groups of clusters in AWS, each one with a specific purpose:

Cluster HPC: This cluster is tightly coupled, and the network performance is a major concern. In this model, we use higher throughput instances, placement groups, jumbo frames, and single AZ compute nodes, and they need strong orchestration mechanisms. Examples of these technologies are media transcoding services and fraud risk analysis:

Distributed...

Placement groups

Placement groups are a great way to improve the network performance (the highest packets per second between instances) and the lowest latency for intensive applications by co-locating instances physically in the same hardware.

The spread placement groups extends the single hardware limitations of a placement group by using different distributed hardware, eliminating single points of failure.

Creating a placement group

To create a placement group, navigate to EC2, and select Placement groups and Create Placement Group, as follows:

The allocation of instances inside the placement group is a one-time-only action. If you want to modify the placement group by adding instances—you'll need to relaunch...

Elastic MapReduce

Elastic MapReduce (EMR) is a fully-managed cluster platform for running big-data and analytics frameworks such as Apache Hadoop, Spark, HBase, Presto, Impala, Cascading, and Flink. Running Hadoop clusters is a complex and time-consuming task. EMR provisions the cluster and installs frequently used frameworks for data scientists, analysts, and engineers.

EMR provides the flexibility to bootstrap your cluster, with a series of steps defined by the customer to install, configure, and prepare your data to be processed. EMR can use the Hadoop distributed file system on EBS volumes or EMRFS with Amazon S3 as the backing persistence service.

EMR clusters have a variety of use cases, from ETL and batch processing to real-time applications integrating Amazon Firehose or Apache Spark, and a wide number of connectors and integration architectures. Clusters on EMR can be...

Summary

In this chapter, you have learned about some of the options available for clustering in AWS. We remarked on the differences between Cluster HPC and Distributed Grids, and we created a cluster with the CfnCluster framework.

We also discussed some of the networking optimizations available at the hypervisor and interface level, and we learned how to inspect for jumbo frames capabilities and performed a TCP benchmark between instances and created a compute placement group.

We introduced EMR and learned how the MapReduce programming model works, creating an EMR cluster that performs aggregation from logs from a public dataset.

Gabriel Ramirez is a passionate technologist with a broad experience in the Software Industry, he currently works as an Authorized Trainer for Amazon Web Services and Google Cloud. He is holder of 9/9 AWS Certifications and does community work by organizing the AWS User Groups in Mexico.
Read more about Gabriel Ramirez

Stuart Scott

Stuart Scott is the AWS content lead at Cloud Academy where he has created over 40 courses reaching tens of thousands of students. His content focuses heavily on cloud security and compliance, specifically on how to implement and configure AWS services to protect, monitor and secure customer data in an AWS environment. He has written numerous cloud security blogs Cloud Academy and other AWS advanced technology partners. He has taken part in a series of cloud security webinars to share his knowledge and experience within the industry to help those looking to implement a secure and trusted environment. In January 2016 Stuart was awarded 'Expert of the Year' from Experts Exchange for his knowledge share within cloud services to the community.
Read more about Stuart Scott

Other recommended products

Related to this chapter

AWS Certified Security – Specialty Exam Guide

Amazon has come up with Specialty certifications which validates a particular user's expertise that he/she would want to build a career in. This Guide will be a companion to getting skilled with complex and creative security solutions.

BookSep 2020558 pages

AWS Security Cookbook

AWS Security Cookbook lists all the practical solutions to the common problems faced by individuals or organizations in securing their instances. Here readers will learn to troubleshoot security concerns and understand additional patterns and services for securing AWS infrastructure.

BookFeb 2020440 pages

Practical AWS Networking

Amazon Web Services has dominated the public cloud market by a huge margin and continues to be the first choice for many organizations. Networking has been an area of focus for all the leading cloud service providers. AWS has a suite of network-related products which help in performing network related task on AWS.

BookJan 2018258 pages

AWS: Security Best Practices on AWS

With organizations moving their workloads, applications, and infrastructure to the cloud at an unprecedented pace, security of all these resources has been a paradigm shift for all those who are responsible for security; experts, novices, and apprentices alike.

BookMar 2018118 pages

Designing AWS Environments

AWS is one of the most popular public clouds available in the market today. Small to medium-sized businesses, as well as large business organizations, utilize AWS to set up highly scalable and agile infrastructure to meet faster time to market. This book provides hands-on details with respect to the fundamentals of designing an AWS environment with security and governance.

BookSep 2018174 pages

Mastering AWS Security

Security is a key ingredient when it comes to workloads deployed in cloud. Security is highest priority for any organization and it is considered job zero at AWS. Our book will dig deep into the achieving end to end automated security for all workloads deployed, running and stored in AWS cloud.

BookOct 2017252 pages

AWS Networking Cookbook

With a lot of enterprises moving towards cloud, the need for advanced cloud networking has significantly increased. This book follows a recipe-based approach starting with basic topics to covering the pain points of cloud networking, and will teach you to perform complex networking tasks.

BookAug 2017366 pages

Amazon Web Services Bootcamp

AWS Bootcamp is designed to teach you how to build and manage AWS resources using different ways. This highly practical guide leverages the reliability, versatility, and flexible design of the AWS Cloud. It enables you to perform tasks such as hosting multi-tier websites, running large-scale applications, data storage and archival, and a lot more with ease.

BookMar 2018338 pages

AWS for System Administrators

AWS for System Administrators covers a variety of tools, techniques, tips, and tricks for building highly available and fault-tolerant infrastructure. You’ll get to grips with AWS fundamentals and concepts with the help of step-by-step explanations and practical code examples.

BookFeb 2021388 pages

AWS Administration - The Definitive Guide

AWS is at the forefront of Cloud Computing today, providing a plethora of ready to use services that help organizations quickly build, scale and deploy massive workloads on the Cloud. This book is specially designed for users who wish to explore and get started with some of the most commonly used AWS services in a quick and efficient way.

BookMar 2018358 pages

AWS Certified Developer - Associate Guide

With rapid adaptation of the cloud platform, the need for cloud certification has also increased. This is your one stop solution and will help you transform yourself from zero to certified. This guide will help you gain technical expertise in the AWS platform and help you start working with various AWS Services.

BookJun 2019812 pages5

AWS Tools for PowerShell 6

Amazon Web Services is the leading cloud platform today. Using Windows PowerShell, this book shows you exactly how to automate all aspects of AWS. You can take advantage of the amazing power of the cloud, yet add powerful scripts and mechanisms to perform common tasks faster than ever before. In this book, you will learn to use Amazon Web Services to automate and manage Windows servers. You will also gain a good understanding of automating the AWS infrastructure using simple coding.

BookAug 2017372 pages

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2