Spatial statistics are a set of exploratory techniques for describing and modeling spatial distributions, patterns, processes, and relationships. Although spatial statistics are similar to traditional statistics, they also integrate spatial relationships into the calculations. In spatial statistics, proximity is important. Things that are closer together are more related.
ArcGIS includes the
Spatial Statistics Tools toolbox available for all license levels of its desktop software. Included with this toolbox are a number of toolsets that help analyze spatial distributions, patterns, clustering, and relationships in GIS datasets. This book will cover each of the toolsets provided with the Spatial Statistics Tools toolbox in ArcGIS to provide a comprehensive survey of the spatial statistics tools available to ArcGIS users.
The R platform for data analysis is a programming language and software platform for statistical computing and graphics, and it is supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data analysts for developing statistical software and data analysis. In addition, R can be used for spatial statistical analysis and can also be integrated with ArcGIS through the R-ArcGIS Bridge.
This book also contains an introductory chapter for the R programming language as well as a chapter that covers the installation of the R-ArcGIS Bridge and the creation of custom ArcGIS script tools written with R.
In this chapter, we will cover the following topics:
- Introduction to spatial statistics
- An overview of the Spatial Statistics Tools toolbox in ArcGIS
- An overview of the integration between R and ArcGIS
Let's start with a definition of spatial statistics. The GIS dictionary (http://gisgeography.com/gis-dictionary-definition-glossary/) defines spatial statistics as the field of study concerning statistical methods that use space and spatial relationships (such as distance, area, volume, length, height, orientation, centrality, and/or other spatial characteristics of data) directly in their mathematical computations. Spatial statistics are used for a variety of different types of analyses, including pattern analysis, shape analysis, surface modeling and surface prediction, spatial regression, statistical comparisons of spatial datasets, statistical modeling and prediction of spatial interaction, and more. The many types of spatial statistics include descriptive, inferential, exploratory, geostatistical, and econometric statistics.
Spatial statistics are applicable across a wide range of environmental disciplines, including agriculture, geology, soil science, hydrology, ecology, oceanography, forestry, meteorology, and climatology, among others. Many socio-economic disciplines including epidemiology, crime analysis, real estate, planning, and others also benefit from spatial statistical analysis.
Spatial statistics can give answers to the following questions:
- How are the features distributed?
- What is the pattern created by the features?
- Which are the clusters?
- How do patterns and clusters of different variables compare to one another?
- What is the relationship between sets of features or values?
The ArcGIS Spatial Statistics Tools toolbox is available for all license levels of ArcGIS Desktop, including basic, standard, and advanced. The toolbox includes a number of toolsets, which are as follows:
Measuring Geographic Distributionstoolset
Modeling Spatial Relationshipstoolset
Measuring Geographic Distributions toolset in the
Spatial Statistics ToolsÂ toolbox contains a set of tools that provide descriptive geographic statistics, including the
Linear Directional Mean,
Median Center, and
Standard Distance tools. Together, this toolset provides a set of basic statistical exploration tools. These basic descriptive statistics are used only as a starting point in the analysis process. The following screenshot displays the output from the
Directional Distribution tool for an analysis of crime data:
Mean Center, and
Median Center tools all provide similar functionality. Each creates a feature class containing a single feature that represents the centrality of a geographic dataset.
Linear Directional Mean tool identifies the mean direction, length, and geographic center for a set of lines. The output of this tool is a feature class with a single linear feature.
Standard Distance and
Directional Distribution tools are similar, in that they both measure the degree to which features are concentrated or dispersed around the geometric center, but the
Directional Distribution tool, also known as the
Standard Deviational Ellipse, is superior as it also provides a measure of directionality in the dataset.
Analyzing Patterns toolset in the
Spatial Statistics Tools toolbox contains a series of tools that help evaluate whether features or the values associated with features form a clustered, dispersed, or random spatial pattern. These tools generate a single result for the entire dataset in question. In addition, the result does not take the form of a map, but rather statistical output, as shown in the following screenshot:
Tools in this category generate what is known as inferential statistics or the probability of how confident we are that the pattern is either dispersed or clustered. Let's examine the following tools found in the
Analyzing Patterns toolset:
Average Nearest Neighbor: This tool calculates the nearest neighbor index based on the average distance from each feature to its nearest neighboring feature. For each feature in a dataset, the distance to its nearest neighbor is computed. An average distance is then computed. The average distance is compared to the expected average distance. In doing so, an ANN ratio is created, which in simple terms is the observed/expected. If the ratio is less than 1, we can say that the data exhibits a clustered patterns, whereas a value greater than 1 indicates a dispersed pattern in our data.
Spatial Autocorrelation: This tool measures spatial autocorrelation by simultaneously measuring feature locations and attribute values. If features that are close together have similar values, then that is said to be clustering. However, if features that are close together have dissimilar values then they form a dispersed pattern. This tool outputs a Moran's I index value along with a z-score and a p-value.
Spatial Autocorrelation (Morans I): This tool is similar to the previous tools, but it measures spatial autocorrelation for a series of distances and can create an optional line graph of those distances along with their corresponding z-scores. This tool is similar to the new
Optimized Hot Spottool and isn't used as frequently anymore as a result. This tool is often used as a distance aid for other tools such as
Hot Spot Analysisor
High/Low Clustering (Getis-Ord General G): This looks for high value clusters and low value clusters. It is used to measure the concentration of high or low values for a given study area and return the Observed General G, Expected General G, z-score, and p-value. It is most appropriate when there is a fairly even distribution of values.
Multi-Distance Spatial Cluster Analysis (Ripleys K Function): This determines whether feature locations show significant clustering or dispersion. However, unlike the other spatial pattern tools that we've examined in this section, it does not take the value at a location into account. It only determines clustering by the location of the features. This tool is often used in fields such as environmental studies, health care, and crime where you are attempting to determine whether one feature attracts another feature.
Mapping Clusters toolset is probably the most well-known and commonly used toolset in the
Spatial Statistics Tools toolbox, and for a good reason. The output from these tools is highly visual and beneficial in the analysis of clustering phenomena. There are many examples of clustering: housing, businesses, trees, crimes, and many others. The degree of this clustering is also important. The tools in the
Mapping Clusters toolset don't just answer the question Is there clustering?, but they also take on the question of Where is the clustering?
Tools in the
Mapping Clusters toolset are among the most commonly used in the
Spatial Statistics Tools toolbox:
Hot Spot Analysis: This tool is probably the most popular tool in the Spatial Statistics Tools toolbox, and given a set of weighted features, it will identify statistically hot and cold spots using the
Getis-Ord Gi*statistics, as shown in the output of real estate sales activity in the following screenshot:
Similarity Search: This tool is used to identify candidate features that are most similar or most dissimilar to one or more input features by the attributes of a feature. Dissimilarity searches can be equally as important as similarity searches. For example, a community development organization, in its attempts to attract new businesses, might show that their city is dissimilar to other competing cities when comparing crimes.
Grouping Analysis: This tool groups features based on feature attributes, as well as optional spatial/temporal constraints. The output of this tool is the creation of distinct groups of data where the features that are part of the group are as similar as possible and between groups are as dissimilar as possible. An example is displayed in the following screenshot. The tool is capable of multivariate analysis and the output is a map and a report. The output map can have either contiguous groups or non-contiguous groups:
Cluster and Outlier Analysis: The final tool in the
Mapping Clusterstoolset is the
Cluster and Outlier Analysistool. This tool, in addition to performing hot spot analysis, identifies outliers in your data. Outliers are extremely relevant to many types of analyses. The tool starts by separating features and neighborhoods from the study area. Each feature is examined against every other feature to see whether it is significantly different from the other features. Likewise, each neighborhood is examined in relationship to all other neighborhoods to see whether it is statistically different than other neighborhoods. An example of the output from the
Cluster and Outlier Analysistool is provided in the following screenshot:
Modeling Spatial Relationships toolset contains a number of regression analysis tools that help you examine and/or quantify the relationships between features. They help measure how features in a dataset relate to each other in space.
The regression tools provided in the
Spatial Statistics Tools toolbox model relationships among data variables associated with geographic features, allowing you to make predictions for unknown values or to better understand key factors influencing a variable you are trying to model. Regression methods allow you to verify relationships and to measure how strong those relationships are. The
Exploratory Regression tool allows you to examine a large number of
Ordinary Least Squares models quickly, summarize variable relationships, and determine whether any combination of candidate explanatory variables satisfy all of the requirements of the OLS method.
There are two regression analysis tools in ArcGIS which are as follows:
Ordinary Least Squares: This tool is a linear regression tool used to generate predictions or model a dependent variable in terms of its relationships to a set of explanatory variables. OLS is the best-known regression technique and provides a good starting point for spatial regression analysis. This tool provides a global model of a variable or process you are trying to understand or predict. The result is a single regression equation that depicts a positive or negative linear relationship. The following screenshot depicts partial output from the OLS tool:
Geographically Weighted Regression:
Geographically Weighted Regressionor GWR is a local form of linear regression for modeling spatially varying relationships. Note that this tool does require an Advanced ArcGIS license. GWR constructs a separate equation for each feature and is most appropriate when you have several hundred features. GWR creates an output feature class (shown in the following screenshot) and table. The output table contains a summary of the tool execution. When running GWR, you should use the same explanatory variables that you specified in your OLS model:
Modeling Spatial Relationships toolset also includes the
Exploratory Regression tool.
Exploratory Regression: This tool can be used to evaluate combinations of exploratory variables for OLS models that best explain the dependent variable. This data-mining tool does a lot of the work for you for finding variables that are well suited and can save you a lot of time finding the right combination of variables. The results of this tool are written to the progress dialog, result window, and an optional report file. An example of the output from the
Exploratory Regressiontool can been seen in the following screenshot:
The R Project for Statistical Computing, or simply referred to as R, is a free software environment for statistical computing and graphics. It is also a programming language that is widely used among statisticians and data miners for developing statistical software and data analysis.
Although there are other programming languages for handling statistics, R has become the de facto language of statistical routines, offering a package repository with over 6,400 problem solving packages. It offers versatile and powerful plotting. It also has the advantage of treating tabular and multidimensional data as a labeled, indexed series of observations.
The R-ArcGIS Bridge is a free, open source R package that connects ArcGIS and R. It was released together with an R ArcGIS community website on GitHub, encouraging collaboration between the two communities. The package serves the following three purposes:
- ArcGIS developers can now create custom tools and toolboxes that integrate ArcGIS and R
- ArcGIS users can access R code through geoprocessing scripts
- R users can access GIS data managed in traditional GIS ways
This book incudes an introductory chapter on the R language along with a chapter detailing the installation of the RÃÂArcGIS Bridge and the creation of custom ArcGIS script tools using R. Using R with ArcGIS Bridge enables the creation of custom ArcGIS tools that will connect GIS data sources, such as feature classes to create statistical output from the R programming language, as shown in the following screenshot:
In this chapter, we introduced the topic of spatial statistics and described its basic characteristics. We also briefly reviewed the spatial statistics tools provided by ArcGIS Desktop. In later chapters, we will dive into these tools for a deeper understanding of the functionality they provide. In the next chapter, we'll examine the tools provided by the
Measuring Geographic Distributions toolbox.