Spatial Analytics with ArcGIS

5 (3 reviews total)
By Eric Pimpler
  • Instant online access to over 8,000+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Introduction to Spatial Statistics in ArcGIS and R

About this book

Spatial statistics has the potential to provide insight that is not otherwise available through traditional GIS tools. This book is designed to introduce you to the use of spatial statistics so you can solve complex geographic analysis.

The book begins by introducing you to the many spatial statistics tools available in ArcGIS. You will learn how to analyze patterns, map clusters, and model spatial relationships with these tools. Further on, you will explore how to extend the spatial statistics tools currently available in ArcGIS, and use the R programming language to create custom tools in ArcGIS through the ArcGIS Bridge using real-world examples.

At the end of the book, you will be presented with two exciting case studies where you will be able to practically apply all your learning to analyze and gain insights into real estate data.

Publication date:
April 2017
Publisher
Packt
Pages
290
ISBN
9781787122581

 

Chapter 1. Introduction to Spatial Statistics in ArcGIS and R

Spatial statistics are a set of exploratory techniques for describing and modeling spatial distributions, patterns, processes, and relationships. Although spatial statistics are similar to traditional statistics, they also integrate spatial relationships into the calculations. In spatial statistics, proximity is important. Things that are closer together are more related.

ArcGIS includes the Spatial Statistics Tools toolbox available for all license levels of its desktop software. Included with this toolbox are a number of toolsets that help analyze spatial distributions, patterns, clustering, and relationships in GIS datasets. This book will cover each of the toolsets provided with the Spatial Statistics Tools toolbox in ArcGIS to provide a comprehensive survey of the spatial statistics tools available to ArcGIS users.

The R platform for data analysis is a programming language and software platform for statistical computing and graphics, and it is supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data analysts for developing statistical software and data analysis. In addition, R can be used for spatial statistical analysis and can also be integrated with ArcGIS through the R-ArcGIS Bridge.

This book also contains an introductory chapter for the R programming language as well as a chapter that covers the installation of the R-ArcGIS Bridge and the creation of custom ArcGIS script tools written with R.

In this chapter, we will cover the following topics:

  • Introduction to spatial statistics
  • An overview of the Spatial Statistics Tools toolbox in ArcGIS
  • An overview of the integration between R and ArcGIS
 

Introduction to spatial statistics


Let's start with a definition of spatial statistics. The GIS dictionary (http://gisgeography.com/gis-dictionary-definition-glossary/) defines spatial statistics as the field of study concerning statistical methods that use space and spatial relationships (such as distance, area, volume, length, height, orientation, centrality, and/or other spatial characteristics of data) directly in their mathematical computations. Spatial statistics are used for a variety of different types of analyses, including pattern analysis, shape analysis, surface modeling and surface prediction, spatial regression, statistical comparisons of spatial datasets, statistical modeling and prediction of spatial interaction, and more. The many types of spatial statistics include descriptive, inferential, exploratory, geostatistical, and econometric statistics.

Spatial statistics are applicable across a wide range of environmental disciplines, including agriculture, geology, soil science, hydrology, ecology, oceanography, forestry, meteorology, and climatology, among others. Many socio-economic disciplines including epidemiology, crime analysis, real estate, planning, and others also benefit from spatial statistical analysis.

Spatial statistics can give answers to the following questions:

  • How are the features distributed?
  • What is the pattern created by the features?
  • Which are the clusters?
  • How do patterns and clusters of different variables compare to one another?
  • What is the relationship between sets of features or values?
 

An overview of the Spatial Statistics Tools toolbox in ArcGIS


The ArcGIS Spatial Statistics Tools toolbox is available for all license levels of ArcGIS Desktop, including basic, standard, and advanced. The toolbox includes a number of toolsets, which are as follows:

  • The Analyzing Patterns toolset
  • The Mapping Clusters toolset
  • The Measuring Geographic Distributions toolset
  • The Modeling Spatial Relationships toolset

The Measuring Geographic Distributions toolset

The Measuring Geographic Distributions toolset in the Spatial Statistics Tools toolbox contains a set of tools that provide descriptive geographic statistics, including the Central Feature, Directional Distribution, Linear Directional Mean, Mean Center, Median Center, and Standard Distance tools. Together, this toolset provides a set of basic statistical exploration tools. These basic descriptive statistics are used only as a starting point in the analysis process. The following screenshot displays the output from the Directional Distribution tool for an analysis of crime data:

The Central Feature, Mean Center, and Median Center tools all provide similar functionality. Each creates a feature class containing a single feature that represents the centrality of a geographic dataset.

The Linear Directional Mean tool identifies the mean direction, length, and geographic center for a set of lines. The output of this tool is a feature class with a single linear feature.

The Standard Distance and Directional Distribution tools are similar, in that they both measure the degree to which features are concentrated or dispersed around the geometric center, but the Directional Distribution tool, also known as the Standard Deviational Ellipse, is superior as it also provides a measure of directionality in the dataset.

The Analyzing Patterns toolset

The Analyzing Patterns toolset in the Spatial Statistics Tools toolbox contains a series of tools that help evaluate whether features or the values associated with features form a clustered, dispersed, or random spatial pattern. These tools generate a single result for the entire dataset in question. In addition, the result does not take the form of a map, but rather statistical output, as shown in the following screenshot:

Tools in this category generate what is known as inferential statistics or the probability of how confident we are that the pattern is either dispersed or clustered. Let's examine the following tools found in the Analyzing Patterns toolset:

  • Average Nearest Neighbor: This tool calculates the nearest neighbor index based on the average distance from each feature to its nearest neighboring feature. For each feature in a dataset, the distance to its nearest neighbor is computed. An average distance is then computed. The average distance is compared to the expected average distance. In doing so, an ANN ratio is created, which in simple terms is the observed/expected. If the ratio is less than 1, we can say that the data exhibits a clustered patterns, whereas a value greater than 1 indicates a dispersed pattern in our data.
  • Spatial Autocorrelation: This tool measures spatial autocorrelation by simultaneously measuring feature locations and attribute values. If features that are close together have similar values, then that is said to be clustering. However, if features that are close together have dissimilar values then they form a dispersed pattern. This tool outputs a Moran's I index value along with a z-score and a p-value.
  • Spatial Autocorrelation (Morans I): This tool is similar to the previous tools, but it measures spatial autocorrelation for a series of distances and can create an optional line graph of those distances along with their corresponding z-scores. This tool is similar to the new Optimized Hot Spot tool and isn't used as frequently anymore as a result. This tool is often used as a distance aid for other tools such as Hot Spot Analysis or Point Density.
  • High/Low Clustering (Getis-Ord General G): This looks for high value clusters and low value clusters. It is used to measure the concentration of high or low values for a given study area and return the Observed General G, Expected General G, z-score, and p-value. It is most appropriate when there is a fairly even distribution of values.
  • Multi-Distance Spatial Cluster Analysis (Ripleys K Function): This determines whether feature locations show significant clustering or dispersion. However, unlike the other spatial pattern tools that we've examined in this section, it does not take the value at a location into account. It only determines clustering by the location of the features. This tool is often used in fields such as environmental studies, health care, and crime where you are attempting to determine whether one feature attracts another feature.

The Mapping Clusters toolset

The Mapping Clusters toolset is probably the most well-known and commonly used toolset in the Spatial Statistics Tools toolbox, and for a good reason. The output from these tools is highly visual and beneficial in the analysis of clustering phenomena. There are many examples of clustering: housing, businesses, trees, crimes, and many others. The degree of this clustering is also important. The tools in the Mapping Clusters toolset don't just answer the question Is there clustering?, but they also take on the question of Where is the clustering?

Tools in the Mapping Clusters toolset are among the most commonly used in the Spatial Statistics Tools toolbox:

  • Hot Spot Analysis: This tool is probably the most popular tool in the Spatial Statistics Tools toolbox, and given a set of weighted features, it will identify statistically hot and cold spots using the Getis-Ord Gi* statistics, as shown in the output of real estate sales activity in the following screenshot:
  • Similarity Search: This tool is used to identify candidate features that are most similar or most dissimilar to one or more input features by the attributes of a feature. Dissimilarity searches can be equally as important as similarity searches. For example, a community development organization, in its attempts to attract new businesses, might show that their city is dissimilar to other competing cities when comparing crimes.
  • Grouping Analysis: This tool groups features based on feature attributes, as well as optional spatial/temporal constraints. The output of this tool is the creation of distinct groups of data where the features that are part of the group are as similar as possible and between groups are as dissimilar as possible. An example is displayed in the following screenshot. The tool is capable of multivariate analysis and the output is a map and a report. The output map can have either contiguous groups or non-contiguous groups:
  • Cluster and Outlier Analysis: The final tool in the Mapping Clusters toolset is the Cluster and Outlier Analysis tool. This tool, in addition to performing hot spot analysis, identifies outliers in your data. Outliers are extremely relevant to many types of analyses. The tool starts by separating features and neighborhoods from the study area. Each feature is examined against every other feature to see whether it is significantly different from the other features. Likewise, each neighborhood is examined in relationship to all other neighborhoods to see whether it is statistically different than other neighborhoods. An example of the output from the Cluster and Outlier Analysis tool is provided in the following screenshot:

The Modeling Spatial Relationships toolset

The Modeling Spatial Relationships toolset contains a number of regression analysis tools that help you examine and/or quantify the relationships between features. They help measure how features in a dataset relate to each other in space.

The regression tools provided in the Spatial Statistics Tools toolbox model relationships among data variables associated with geographic features, allowing you to make predictions for unknown values or to better understand key factors influencing a variable you are trying to model. Regression methods allow you to verify relationships and to measure how strong those relationships are. The Exploratory Regression tool allows you to examine a large number of Ordinary Least Squares models quickly, summarize variable relationships, and determine whether any combination of candidate explanatory variables satisfy all of the requirements of the OLS method.

There are two regression analysis tools in ArcGIS which are as follows:

  • Ordinary Least Squares: This tool is a linear regression tool used to generate predictions or model a dependent variable in terms of its relationships to a set of explanatory variables. OLS is the best-known regression technique and provides a good starting point for spatial regression analysis. This tool provides a global model of a variable or process you are trying to understand or predict. The result is a single regression equation that depicts a positive or negative linear relationship. The following screenshot depicts partial output from the OLS tool:
  • Geographically Weighted Regression: Geographically Weighted Regression or GWR is a local form of linear regression for modeling spatially varying relationships. Note that this tool does require an Advanced ArcGIS license. GWR constructs a separate equation for each feature and is most appropriate when you have several hundred features. GWR creates an output feature class (shown in the following screenshot) and table. The output table contains a summary of the tool execution. When running GWR, you should use the same explanatory variables that you specified in your OLS model:

The Modeling Spatial Relationships toolset also includes the Exploratory Regression tool.

  • Exploratory Regression: This tool can be used to evaluate combinations of exploratory variables for OLS models that best explain the dependent variable. This data-mining tool does a lot of the work for you for finding variables that are well suited and can save you a lot of time finding the right combination of variables. The results of this tool are written to the progress dialog, result window, and an optional report file. An example of the output from the Exploratory Regression tool can been seen in the following screenshot:
 

Integrating R with ArcGIS


The R Project for Statistical Computing, or simply referred to as R, is a free software environment for statistical computing and graphics. It is also a programming language that is widely used among statisticians and data miners for developing statistical software and data analysis.

Although there are other programming languages for handling statistics, R has become the de facto language of statistical routines, offering a package repository with over 6,400 problem solving packages. It offers versatile and powerful plotting. It also has the advantage of treating tabular and multidimensional data as a labeled, indexed series of observations.

The R-ArcGIS Bridge is a free, open source R package that connects ArcGIS and R. It was released together with an R ArcGIS community website on GitHub, encouraging collaboration between the two communities. The package serves the following three purposes:

  • ArcGIS developers can now create custom tools and toolboxes that integrate ArcGIS and R
  • ArcGIS users can access R code through geoprocessing scripts
  • R users can access GIS data managed in traditional GIS ways

This book incudes an introductory chapter on the R language along with a chapter detailing the installation of the R­ArcGIS Bridge and the creation of custom ArcGIS script tools using R. Using R with ArcGIS Bridge enables the creation of custom ArcGIS tools that will connect GIS data sources, such as feature classes to create statistical output from the R programming language, as shown in the following screenshot:

 

Summary


In this chapter, we introduced the topic of spatial statistics and described its basic characteristics. We also briefly reviewed the spatial statistics tools provided by ArcGIS Desktop. In later chapters, we will dive into these tools for a deeper understanding of the functionality they provide. In the next chapter, we'll examine the tools provided by the Measuring Geographic Distributions toolbox.

About the Author

  • Eric Pimpler

    Eric Pimpler is the founder and owner of GeoSpatial Training Services and has over 20 years of experience in implementing and teaching GIS solutions using Esri, Google Earth/Maps, and open source technology. Currently, he focuses on ArcGIS scripting with Python and the development of custom ArcGIS Server web and mobile applications using JavaScript. He is the author of Programming ArcGIS 10.1 with Python Cookbook. Eric has a bachelor's degree in geography from Texas A&M University and a master's degree in applied geography with a specification in GIS from Texas State University.

    Browse publications by this author

Latest Reviews

(3 reviews total)
Just the book for novice and undergrads with interest in spatial analytics. Get this in e-book format is recommended.
Porque me parece bien.........
The nitty gritty of analytics

Recommended For You

Book Title
Unlock this full book with a FREE 10-day trial
Start Free Trial