Spatial statistics are a set of exploratory techniques for describing and modeling spatial distributions, patterns, processes, and relationships. Although spatial statistics are similar to traditional statistics, they also integrate spatial relationships into the calculations. In spatial statistics, proximity is important. Things that are closer together are more related.

ArcGIS includes the ** Spatial Statistics Tools** toolbox available for all license levels of its desktop software. Included with this toolbox are a number of toolsets that help analyze spatial distributions, patterns, clustering, and relationships in GIS datasets. This book will cover each of the toolsets provided with the Spatial Statistics Tools toolbox in ArcGIS to provide a comprehensive survey of the spatial statistics tools available to ArcGIS users.

The R platform for data analysis is a programming language and software platform for statistical computing and graphics, and it is supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data analysts for developing statistical software and data analysis. In addition, R can be used for spatial statistical analysis and can also be integrated with ArcGIS through the R-ArcGIS Bridge.

This book also contains an introductory chapter for the R programming language as well as a chapter that covers the installation of the R-ArcGIS Bridge and the creation of custom ArcGIS script tools written with R.

In this chapter, we will cover the following topics:

- Introduction to spatial statistics
- An overview of the Spatial Statistics Tools toolbox in ArcGIS
- An overview of the integration between R and ArcGIS

Let's start with a definition of spatial statistics. The GIS dictionary (http://gisgeography.com/gis-dictionary-definition-glossary/) defines spatial statistics as the field of study concerning statistical methods that use space and spatial relationships (such as distance, area, volume, length, height, orientation, centrality, and/or other spatial characteristics of data) directly in their mathematical computations. Spatial statistics are used for a variety of different types of analyses, including pattern analysis, shape analysis, surface modeling and surface prediction, spatial regression, statistical comparisons of spatial datasets, statistical modeling and prediction of spatial interaction, and more. The many types of spatial statistics include descriptive, inferential, exploratory, geostatistical, and econometric statistics.

Spatial statistics are applicable across a wide range of environmental disciplines, including agriculture, geology, soil science, hydrology, ecology, oceanography, forestry, meteorology, and climatology, among others. Many socio-economic disciplines including epidemiology, crime analysis, real estate, planning, and others also benefit from spatial statistical analysis.

Spatial statistics can give answers to the following questions:

- How are the features distributed?
- What is the pattern created by the features?
- Which are the clusters?
- How do patterns and clusters of different variables compare to one another?
- What is the relationship between sets of features or values?

The ArcGIS Spatial Statistics Tools toolbox is available for all license levels of ArcGIS Desktop, including basic, standard, and advanced. The toolbox includes a number of toolsets, which are as follows:

- The
toolset`Analyzing Patterns`

- The
toolset`Mapping Clusters`

- The
toolset`Measuring Geographic Distributions`

- The
toolset`Modeling Spatial Relationships`

The ** Measuring Geographic Distributions** toolset in the

**toolbox contains a set of tools that provide descriptive geographic statistics, including the**

`Spatial Statistics Tools`

**,**

`Central Feature`

**,**

`Directional Distribution`

**,**

`Linear Directional Mean`

**,**

`Mean Center`

**, and**

`Median Center`

**tools. Together, this toolset provides a set of basic statistical exploration tools. These basic descriptive statistics are used only as a starting point in the analysis process. The following screenshot displays the output from the**

`Standard Distance`

**tool for an analysis of crime data:**

`Directional Distribution`

The ** Central Feature**,

**, and**

`Mean Center`

**tools all provide similar functionality. Each creates a feature class containing a single feature that represents the centrality of a geographic dataset.**

`Median Center`

The ** Linear Directional Mean** tool identifies the mean direction, length, and geographic center for a set of lines. The output of this tool is a feature class with a single linear feature.

The ** Standard Distance** and

**tools are similar, in that they both measure the degree to which features are concentrated or dispersed around the geometric center, but the**

`Directional Distribution`

**tool, also known as the**

`Directional Distribution`

**, is superior as it also provides a measure of directionality in the dataset.**

` Standard Deviational Ellipse`

The ** Analyzing Patterns** toolset in the

**toolbox contains a series of tools that help evaluate whether features or the values associated with features form a clustered, dispersed, or random spatial pattern. These tools generate a single result for the entire dataset in question. In addition, the result does not take the form of a map, but rather statistical output, as shown in the following screenshot:**

`Spatial Statistics Tools`

Tools in this category generate what is known as inferential statistics or the probability of how confident we are that the pattern is either dispersed or clustered. Let's examine the following tools found in the ** Analyzing Patterns** toolset:

: This tool calculates the nearest neighbor index based on the average distance from each feature to its nearest neighboring feature. For each feature in a dataset, the distance to its nearest neighbor is computed. An average distance is then computed. The average distance is compared to the expected average distance. In doing so, an ANN ratio is created, which in simple terms is the observed/expected. If the ratio is less than 1, we can say that the data exhibits a clustered patterns, whereas a value greater than 1 indicates a dispersed pattern in our data.`Average Nearest Neighbor`

: This tool measures spatial autocorrelation by simultaneously measuring feature locations and attribute values. If features that are close together have similar values, then that is said to be clustering. However, if features that are close together have dissimilar values then they form a dispersed pattern. This tool outputs a Moran's I index value along with a z-score and a p-value.`Spatial Autocorrelation`

: This tool is similar to the previous tools, but it measures spatial autocorrelation for a series of distances and can create an optional line graph of those distances along with their corresponding z-scores. This tool is similar to the new`Spatial Autocorrelation (Morans I)`

tool and isn't used as frequently anymore as a result. This tool is often used as a distance aid for other tools such as`Optimized Hot Spot`

or`Hot Spot Analysis`

.`Point Density`

: This looks for high value clusters and low value clusters. It is used to measure the concentration of high or low values for a given study area and return the Observed General G, Expected General G, z-score, and p-value. It is most appropriate when there is a fairly even distribution of values.`High/Low Clustering (Getis-Ord General G)`

: This determines whether feature locations show significant clustering or dispersion. However, unlike the other spatial pattern tools that we've examined in this section, it does not take the value at a location into account. It only determines clustering by the location of the features. This tool is often used in fields such as environmental studies, health care, and crime where you are attempting to determine whether one feature attracts another feature.`Multi-Distance Spatial Cluster Analysis (Ripleys K Function)`

The ** Mapping Clusters** toolset is probably the most well-known and commonly used toolset in the

**toolbox, and for a good reason. The output from these tools is highly visual and beneficial in the analysis of clustering phenomena. There are many examples of clustering: housing, businesses, trees, crimes, and many others. The degree of this clustering is also important. The tools in the**

`Spatial Statistics Tools`

**toolset don't just answer the question**

`Mapping Clusters`

*Is there clustering?*, but they also take on the question of

*Where is the clustering?*

Tools in the ** Mapping Clusters** toolset are among the most commonly used in the

**toolbox:**

`Spatial Statistics Tools`

: This tool is probably the most popular tool in the Spatial Statistics Tools toolbox, and given a set of weighted features, it will identify statistically hot and cold spots using the`Hot Spot Analysis`

`Getis-Ord Gi*`

statistics, as shown in the output of real estate sales activity in the following screenshot:

: This tool is used to identify candidate features that are most similar or most dissimilar to one or more input features by the attributes of a feature. Dissimilarity searches can be equally as important as similarity searches. For example, a community development organization, in its attempts to attract new businesses, might show that their city is dissimilar to other competing cities when comparing crimes.`Similarity Search`

: This tool groups features based on feature attributes, as well as optional spatial/temporal constraints. The output of this tool is the creation of distinct groups of data where the features that are part of the group are as similar as possible and between groups are as dissimilar as possible. An example is displayed in the following screenshot. The tool is capable of multivariate analysis and the output is a map and a report. The output map can have either contiguous groups or non-contiguous groups:`Grouping Analysis`

: The final tool in the`Cluster and Outlier Analysis`

toolset is the`Mapping Clusters`

tool. This tool, in addition to performing hot spot analysis, identifies outliers in your data. Outliers are extremely relevant to many types of analyses. The tool starts by separating features and neighborhoods from the study area. Each feature is examined against every other feature to see whether it is significantly different from the other features. Likewise, each neighborhood is examined in relationship to all other neighborhoods to see whether it is statistically different than other neighborhoods. An example of the output from the`Cluster and Outlier Analysis`

tool is provided in the following screenshot:`Cluster and Outlier Analysis`

The ** Modeling Spatial Relationships** toolset contains a number of regression analysis tools that help you examine and/or quantify the relationships between features. They help measure how features in a dataset relate to each other in space.

The regression tools provided in the ** Spatial Statistics Tools** toolbox model relationships among data variables associated with geographic features, allowing you to make predictions for unknown values or to better understand key factors influencing a variable you are trying to model. Regression methods allow you to verify relationships and to measure how strong those relationships are. The

**tool allows you to examine a large number of**

`Exploratory Regression`

**models quickly, summarize variable relationships, and determine whether any combination of candidate explanatory variables satisfy all of the requirements of the OLS method.**

`Ordinary Least Squares`

There are two regression analysis tools in ArcGIS which are as follows:

: This tool is a linear regression tool used to generate predictions or model a dependent variable in terms of its relationships to a set of explanatory variables. OLS is the best-known regression technique and provides a good starting point for spatial regression analysis. This tool provides a global model of a variable or process you are trying to understand or predict. The result is a single regression equation that depicts a positive or negative linear relationship. The following screenshot depicts partial output from the OLS tool:`Ordinary Least Squares`

:`Geographically Weighted Regression`

or GWR is a local form of linear regression for modeling spatially varying relationships. Note that this tool does require an Advanced ArcGIS license. GWR constructs a separate equation for each feature and is most appropriate when you have several hundred features. GWR creates an output feature class (shown in the following screenshot) and table. The output table contains a summary of the tool execution. When running GWR, you should use the same explanatory variables that you specified in your OLS model:`Geographically Weighted Regression`

The ** Modeling Spatial Relationships** toolset also includes the

**tool.**

`Exploratory Regression`

: This tool can be used to evaluate combinations of exploratory variables for OLS models that best explain the dependent variable. This data-mining tool does a lot of the work for you for finding variables that are well suited and can save you a lot of time finding the right combination of variables. The results of this tool are written to the progress dialog, result window, and an optional report file. An example of the output from the`Exploratory Regression`

tool can been seen in the following screenshot:`Exploratory Regression`

The R Project for Statistical Computing, or simply referred to as R, is a free software environment for statistical computing and graphics. It is also a programming language that is widely used among statisticians and data miners for developing statistical software and data analysis.

Although there are other programming languages for handling statistics, R has become the de facto language of statistical routines, offering a package repository with over 6,400 problem solving packages. It offers versatile and powerful plotting. It also has the advantage of treating tabular and multidimensional data as a labeled, indexed series of observations.

The R-ArcGIS Bridge is a free, open source R package that connects ArcGIS and R. It was released together with an R ArcGIS community website on GitHub, encouraging collaboration between the two communities. The package serves the following three purposes:

- ArcGIS developers can now create custom tools and toolboxes that integrate ArcGIS and R
- ArcGIS users can access R code through geoprocessing scripts
- R users can access GIS data managed in traditional GIS ways

This book incudes an introductory chapter on the R language along with a chapter detailing the installation of the RÂArcGIS Bridge and the creation of custom ArcGIS script tools using R. Using R with ArcGIS Bridge enables the creation of custom ArcGIS tools that will connect GIS data sources, such as feature classes to create statistical output from the R programming language, as shown in the following screenshot:

In this chapter, we introduced the topic of spatial statistics and described its basic characteristics. We also briefly reviewed the spatial statistics tools provided by ArcGIS Desktop. In later chapters, we will dive into these tools for a deeper understanding of the functionality they provide. In the next chapter, we'll examine the tools provided by the ** Measuring Geographic Distributions** toolbox.