Home Programming Geospatial Data Analytics on AWS

Geospatial Data Analytics on AWS

By Scott Bateman , Janahan Gnanachandran , Jeff DeMuth
books-svg-icon Book
Subscription FREE
eBook + Subscription €11.99
eBook €26.99
Print + eBook €33.99
READ FOR FREE Free Trial for 7 days. €11.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
READ FOR FREE Free Trial for 7 days. €11.99 p/m after trial. Cancel Anytime! BUY NOW BUY NOW BUY NOW
Subscription FREE
eBook + Subscription €11.99
eBook €26.99
Print + eBook €33.99
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
  1. Free Chapter
    Chapter 1: Introduction to Geospatial Data in the Cloud
About this book
Managing geospatial data and building location-based applications in the cloud can be a daunting task. This comprehensive guide helps you overcome this challenge by presenting the concept of working with geospatial data in the cloud in an easy-to-understand way, along with teaching you how to design and build data lake architecture in AWS for geospatial data. You’ll begin by exploring the use of AWS databases like Redshift and Aurora PostgreSQL for storing and analyzing geospatial data. Next, you’ll leverage services such as DynamoDB and Athena, which offer powerful built-in geospatial functions for indexing and querying geospatial data. The book is filled with practical examples to illustrate the benefits of managing geospatial data in the cloud. As you advance, you’ll discover how to analyze and visualize data using Python and R, and utilize QuickSight to share derived insights. The concluding chapters explore the integration of commonly used platforms like Open Data on AWS, OpenStreetMap, and ArcGIS with AWS to enable you to optimize efficiency and provide a supportive community for continuous learning. By the end of this book, you’ll have the necessary tools and expertise to build and manage your own geospatial data lake on AWS, along with the knowledge needed to tackle geospatial data management challenges and make the most of AWS services.
Publication date:
June 2023
Publisher
Packt
Pages
276
ISBN
9781804613825

 

Introduction to Geospatial Data in the Cloud

This book is divided into four parts that will walk you through key concepts, tools, and techniques for dealing with geospatial data. Part 1 sets the foundation for the entire book, establishing key ideas that provide synergy with subsequent parts. Each chapter is further subdivided into topics that dive deep into a specific subject. This introductory chapter of Part 1 will cover the following topics:

  • Introduction to cloud computing and AWS
  • Storing geospatial data in the cloud
  • Building your geospatial data strategy
  • Geospatial data management best practices
  • Cost management in the cloud
 

Introduction to cloud computing and AWS

You are most likely familiar with the benefits that geospatial analysis can provide. Governmental entities, corporations, and other organizations routinely solve complex, location-based problems with the help of geospatial computing. While paper maps are still around, most use cases for geospatial data have evolved to live in the digital world. We can now create maps faster and draw more geographical insights from data than at any point in history. This phenomenon has been made possible by blending the expertise of geospatial practitioners with the power of Geographical Information Systems (GIS). Critical thinking and higher-order analysis can be done by humans while computers handle the monotonous data processing and rendering tasks. As the geospatial community continues to refine the balance of which jobs require manual effort and which can be handled by computers, we are collectively improving our ability to understand our world.

Geospatial computing has been around for decades, but the last 10 years have seen a dramatic shift in the capabilities and computing power available to practitioners. The emergence of the cloud as a fundamental building block of technical systems has offered needle-moving opportunities in compute, storage, and analytical capabilities. In addition to a revolution in the infrastructure behind GIS systems, the cloud has expanded the optionality in every layer of the technical stack. Common problems such as running out of disk space, long durations of geospatial processing jobs, limited data availability, and difficult collaboration across teams can be things of the past. AWS provides solutions to these problems and more, and in this book, we will describe, dissect, and provide examples of how you can do this for your organization.

Cloud computing provides the ability to rapidly experiment with new tools and processing techniques that would never be possible using a fixed set of compute resources. Not only are new capabilities available and continually improving but your team will also have more time to learn and use these new technologies with the time saved in creating, configuring, and maintaining the environment. The undifferenced heavy lifting of managing geospatial storage devices, application servers, geodatabases, and data flows can be replaced with time spent analyzing, understanding, and visualizing the data. Traditional this or that technical trade-off decisions are no longer binary proposals. Your organization can use the right tool for each job, and blend as many tools and features into your environment as is appropriate for your requirements. By paying for the precise amount of resources you use in AWS, it is possible to break free from restrictive, punitive, and time-limiting licensing situations. In some cases, the amount of an AWS compute resource you use is measured and charged down to the millisecond, so you literally don’t pay for a second of unused time. If a team infrequently needs to leverage a capability, such as a monthly data processing job, this can result in substantial cost savings by eliminating idle virtual machines and supporting technical resources. If cost savings are not your top concern, the same proportion of your budget can be dedicated to more capable hardware that delivers dramatically reduced timeframes compared to limited compute environments.

The global infrastructure of AWS allows you to position data in the best location to minimize latency, providing the best possible performance. Powerful replication and caching technologies can be used to minimize wait time and allow robust cataloging and characterization of your geospatial assets. The global flexibility of your GIS environment is further enabled with the use of innovative end user compute options. Virtual desktop services in AWS allow organizations to keep the geospatial processing close to the data for maximum performance, even if the user is geographically distanced from both. AWS and the cloud have continued to evolve and provide never-before-seen capabilities in geospatial power and flexibility. Over the course of this book, we will examine what these concepts are, how they work, and how you can put them to work in your environment.

Now that we have learned the story of cloud computing on AWS, let’s check out how we can implement geospatial data there.

 

Storing geospatial data in the cloud

As you learn about the possibilities for storing geospatial data in the cloud, it may seem daunting due to the number of options available. Many AWS customers experiment with Amazon Simple Storage Service (S3) for geospatial data storage as their first project. Relational databases, NoSQL databases, and caching options commonly follow in the evolution of geospatial technical architectures. General GIS data storage best practices still apply to the cloud, so much of the knowledge that practitioners have gained over the years directly applies to geospatial data management on AWS. Familiar GIS file formats that work well in S3 include the following:

  • Shapefiles (.shp, .shx, .dbf, .prj, and others)
  • File geodatabases (.gdb)
  • Keyhole Markup Language (.kml)
  • Comma-Separated Values (.csv)
  • Geospatial JavaScript Object Notation (.geojson)
  • Geostationary Earth Orbit Tagged Image File Format (.tiff)

The physical location of data is still important for latency-sensitive workloads. Formats and organization of data can usually remain unchanged when moving to S3 to limit the impact of migrations. Spatial indexes and use-based access patterns will dramatically improve the performance and ability of your system to deliver the desired capabilities to your users.

Relational databases have long been the cornerstone of most enterprise GIS environments. This is especially true for vector datasets. AWS offers the most comprehensive set of relational database options with flexible sizing and architecture to meet your specific requirements. For customers looking to migrate geodatabases to the cloud with the least amount of environmental change, Amazon Elastic Compute Cloud (EC2) virtual machine instances provide a similar capability to what is commonly used in on-premises data centers. Each database server can be instantiated on the specific operating system that is used by the source server. Using EC2 with Amazon Elastic Block Store (EBS) network-attached storage provides the highest level of control and flexibility. Each server is created by specifying the amount of CPU, memory, and network throughput desired. Relational database management system (RDBMS) software can be manually installed on the EC2 instance, or an Amazon Machine Image (AMI) for the particular use case can be selected from the AWS catalog to remove manual steps from the process. While this option provides the highest degree of flexibility, it also requires the most database configuration and administration knowledge.

Many customers find it useful to leverage Amazon Relational Database Service (RDS) to establish database clusters and instances for their GIS environments. RDS can be leveraged by creating full-featured database Microsoft SQL Server, Oracle, PostgreSQL, MySQL, or MariaDB clusters. AWS allows the selection of specific instance types to focus on memory or compute optimization in a variety of configurations. Multiple Availability Zone (AZ)-enabled databases can be created to establish fault tolerance or improve performance. Using RDS dramatically simplifies database administration, and decreases the time required to select, provision, and configure your geospatial database using the specific technical parameters to meet the business requirements.

Amazon Aurora provides an open source path to highly capable and performant relational databases. PostgreSQL or MySQL environments can be created with specific settings for the desired capabilities. Although this may mean converting data from a source format, such as Microsoft SQL Server or Oracle, the overall cost savings and simplified management make this an attractive option to modernize and right-size any geospatial database.

In addition to standard relational database options, AWS provides other services to manage and use geospatial data. Amazon Redshift is the fastest and most widely used cloud data warehouse and supports geospatial data through the geometry data type. Users can query spatial data in Redshift’s built-in SQL functions to find the distance between two points, interrogate polygon relationships, and provide other location insights into their data. Amazon DynamoDB is a fully managed, key-value NoSQL database with an SLA of up to 99.999% availability. For organizations leveraging MongoDB, Amazon DocumentDB provides a fully managed option for simplified instantiation and management. Finally, AWS offers the Amazon OpenSearch Service for petabyte-scale data storage, search, and visualization.

The best part is that you don’t have to choose a single option for your geospatial environment. Often, companies find that different workloads benefit from having the ability to choose the most appropriate data landscape. Combining Infrastructure as a Service (IaaS) workloads with fully managed databases and modern databases is not only possible but a signature of a well-architected geospatial environment. Transactional systems may benefit from relational geodatabases, while mobile applications may be more aligned with NoSQL data stores. When you operate in a world of consumption-based resources, there is no downside to using the most appropriate data store for each workload. Having familiarity with the cloud options for storing geospatial data is crucial in strategic planning, which we will cover in the next topic.

         
About the Authors
  • Scott Bateman

    Scott Bateman is a Principal Solutions Architect at AWS focused on customers in the energy industry. He holds a BS in Computer Science from Trinity University, and an Executive MBA from Quantic in addition to numerous AWS and Microsoft certifications. As part of the Geospatial Technical Field Community (TFC) group within AWS, Scott is able to regularly speak with customers about common challenges storing data, creating maps, tracking assets, optimizing driving routes, and better understanding facilities and property through remote sensing. When not working or writing, Scott enjoys snowboarding and international travel with his wife of over 2 decades, Angel, and 2 wonderful children Jackson and Emily.

    Browse publications by this author
  • Janahan Gnanachandran

    Janahan (Jana) is a Principal Solutions Architect at AWS. He partners with customers across industries to increase speed, agility, and drive innovation using the cloud for digital transformation. Jana holds a BE in Electronics and Communication Engineering from Anna University, India, and an MS in Computer Engineering from the University of Louisiana at Lafayette. Alongside designing scalable, secure, and cost-effective cloud solutions, Jana conducts workshops, training sessions, and public speaking engagements on cloud best practices, architectural best practices, and data and AI/ML strategies. When not immersed in cloud computing, he enjoys playing tennis or golf and traveling with his wife Dilakshana and their 2 kids, Jashwin and Dhruvish.

    Browse publications by this author
  • Jeff DeMuth

    Jeff Demuth is a solutions architect who joined Amazon Web Services (AWS) in 2016. He focuses on the geospatial community and is passionate about geographic information systems (GIS) and technology. Outside of work, Jeff enjoys traveling, building Internet of Things (IoT) applications, and tinkering with the latest gadgets.

    Browse publications by this author
Geospatial Data Analytics on AWS
Unlock this book and the full library FREE for 7 days
Start now