Spatial Autoregression (Spatial Statistics Tools)

Summary

Estimates a global spatial regression model for a point or polygon feature class.

The assumptions of traditional linear regression models are often violated when using spatial data. When spatial autocorrelation is present in a dataset, coefficient estimates may be biased and lead to overconfident inference. This tool can be used to estimate a regression model that is robust in the presence of spatial dependence and heteroskedasticity, as well as measure spatial spillovers. The tool uses Lagrange Multiplier (LM), also known as a Rao Score, diagnostic tests to determine the model that is most appropriate. Based on the LM diagnostics, either an ordinary least square (OLS), spatial lag model (SLM), spatial error model (SEM), or spatial autoregressive combined model (SAC) may be estimated.

Learn more about how Spatial Autoregression works

Illustration

Spatial Autoregression tool illustration

Usage

The tool accepts only point and polygon inputs.
The dependent variable must be continuous (not binary or categorical).
Explanatory variables must be continuous (not binary or categorical). Do no use binary variables (containing only the values 0 and 1, as they may violate model assumptions and cause an error.
The output of the tool includes a Moran's Scatter Plot of Residuals that can be used to identify autocorrelation in the model's residuals.
The spatial weights matrix used cannot have more than 30 percent connectivity. An error will occur if this threshold is reached to prevent biased estimates.
When using k nearest neighbors with a local weighting scheme, an adaptive bandwidth will be calculated if no bandwidth is provided.
A Spatial Durbin model can be estimated by fitting a SLM and including each explanatory variable and their spatial lags. Use the Neighborhood Summary Statistics tool to calculate spatial lags.
The models are estimated using the following methods related to heteroskedasticity and normality:
- SLM uses Spatial Two Stage Least Squares regression (S2SLS).
- SEM uses Generalized Method of Moments (GMM).
- SAC uses Generalized S2SLS (GS2SLS).

Label	Explanation	Data type
Input Features	The input features containing the dependent and explanatory variables.	Feature Layer
Dependent Variable	The numeric field that will be predicted in the regression model.	Field
Explanatory Variables	A list of fields that will be used to predict the dependent variable in the regression model.	Field
Output Features	The output feature class containing the predicted values of the dependent variable and the residuals.	Feature Class
Model Type	The model type that will be used for the estimation. By default, LM diagnostic tests will be used to determine the model that is the most appropriate for the input data. Auto-detect—LM diagnostic tests will be used to determine whether an OLS, SLM, SEM, or SAC will be estimated. This is the default. Spatial error model (SEM)—A SEM will be estimated regardless of the LM diagnostics. Spatial lag model (SLM)—A SLM will be estimated regardless of the LM diagnostics. Spatial autoregressive combined (SAC)—A SAC will be estimated regardless of the LM diagnostics.	String
Neighborhood Type (Optional)	Specifies how neighbors will be chosen for each input feature. To identify local spatial patterns, neighboring features must be identified for each input feature. Fixed distance band—Features within a specified distance of each feature will be considered neighbors. K nearest neighbors—The closest k features will be considered neighbors. The number of neighbors is specified using the Number of Neighbors parameter. Contiguity edges only—Polygon features that share an edge will be included as neighbors. Contiguity edges corners—Polygon features that share an edge or corner will be included as neighbors. This is the default for polygon features. Delaunay triangulation—Features whose Delaunay triangulation share an edge or corner will be included as neighbors. This is the default for point features. Get spatial weights from file—Neighbors and weights will be defined by a specified spatial weights file. The file is specified using the Weights Matrix File parameter.	String
Distance Band (Optional)	The distance within which features will be included as neighbors. If no value is provided, one will be estimated during processing and included as a geoprocessing message.	Linear Unit
Number of Neighbors (Optional)	The number of neighbors that will be included as neighbors. The number does not include the focal feature. The default is 8.	Long
Weights Matrix File (Optional)	The path and file name of the spatial weights matrix file that defines spatial relationships among features.	File
Local Weighting Scheme (Optional)	Specifies the weighting scheme that will be applied to neighbors. Weights will always be row-standardized unless a spatial weights matrix file is provided. Unweighted—Neighbors will be assigned a weight equal to 1. This is the default. Bisquare—Neighbors will be weighted using a bisquare (quartic) kernel. Gaussian—Neighbors will be weighted using a Gaussian (normal distribution) kernel.	String
Kernel Bandwidth (Optional)	The bandwidth of the weighting kernel. If no value is provided, an adaptive kernel will be used. An adaptive kernel uses the maximum distance from a neighbor to a focal feature as the bandwidth.	Linear Unit

arcpy.stats.SAR(in_features, dependent_variable, explanatory_variables, out_features, model_type, {neighborhood_type}, {distance_band}, {number_of_neighbors}, {weights_matrix_file}, {local_weighting_scheme}, {kernel_bandwidth})

Name	Explanation	Data type
in_features	The input features containing the dependent and explanatory variables.	Feature Layer
dependent_variable	The numeric field that will be predicted in the regression model.	Field
explanatory_variables [explanatory_variables,...]	A list of fields that will be used to predict the dependent variable in the regression model.	Field
out_features	The output feature class containing the predicted values of the dependent variable and the residuals.	Feature Class
model_type	The model type that will be used for the estimation. By default, LM diagnostic tests will be used to determine the model that is the most appropriate for the input data. `AUTO`—LM diagnostic tests will be used to determine whether an OLS, SLM, SEM, or SAC will be estimated. This is the default. `ERROR`—A SEM will be estimated regardless of the LM diagnostics. `LAG`—A SLM will be estimated regardless of the LM diagnostics. `COMBINED`—A SAC will be estimated regardless of the LM diagnostics.	String
neighborhood_type (Optional)	Specifies how neighbors will be chosen for each input feature. To identify local spatial patterns, neighboring features must be identified for each input feature. `DISTANCE_BAND`—Features within a specified distance of each feature will be considered neighbors. `K_NEAREST_NEIGHBORS`—The closest k features will be considered neighbors. The number of neighbors is specified using the `number_of_neighbors` parameter. `CONTIGUITY_EDGES_ONLY`—Polygon features that share an edge will be included as neighbors. `CONTIGUITY_EDGES_CORNERS`—Polygon features that share an edge or corner will be included as neighbors. This is the default for polygon features. `DELAUNAY_TRIANGULATION`—Features whose Delaunay triangulation share an edge or corner will be included as neighbors. This is the default for point features. `GET_SPATIAL_WEIGHTS_FROM_FILE`—Neighbors and weights will be defined by a specified spatial weights file. The file is specified using the `weights_matrix_file` parameter.	String
distance_band (Optional)	The distance within which features will be included as neighbors. If no value is provided, one will be estimated during processing and included as a geoprocessing message.	Linear Unit
number_of_neighbors (Optional)	The number of neighbors that will be included as neighbors. The number does not include the focal feature. The default is 8.	Long
weights_matrix_file (Optional)	The path and file name of the spatial weights matrix file that defines spatial relationships among features.	File
local_weighting_scheme (Optional)	Specifies the weighting scheme that will be applied to neighbors. Weights will always be row-standardized unless a spatial weights matrix file is provided. `UNWEIGHTED`—Neighbors will be assigned a weight equal to 1. This is the default. `BISQUARE`—Neighbors will be weighted using a bisquare (quartic) kernel. `GAUSSIAN`—Neighbors will be weighted using a Gaussian (normal distribution) kernel.	String
kernel_bandwidth (Optional)	The bandwidth of the weighting kernel. If no value is provided, an adaptive kernel will be used. An adaptive kernel uses the maximum distance from a neighbor to a focal feature as the bandwidth.	Linear Unit

Code sample

SAR example 1 (Python window)

The following Python window script demonstrates how to use the SAR function.

# Fit SAR model and auto-detect the regression model.
arcpy.stats.SAR(
    in_features=r"C:\data\data.gdb\house_price",
    dependent_variable="price",
    explanatory_variables=["crime", "income", "school_rate"],
    out_features=r"C:\data\data.gdb\house_price_SAR",
    model_type="AUTO",
    neighborhood_type="DELAUNAY_TRIANGULATION",
    distance_band=None,
    number_of_neighbors=None,
    weights_matrix_file=None,
    local_weighting_scheme="UNWEIGHTED",
    kernel_bandwidth=None
)

SAR example 2 (stand-alone script)

The following stand-alone script demonstrates how to use the SAR function.

# Fit SAR model using SLM.

# Import modules
import arcpy

# Set the current workspace
arcpy.env.workspace = r"C:\data\data.gdb"

# Run SAR tool with Spatial Lag model
arcpy.stats.SAR(
    in_features=r"health_factors_CA",
    dependent_variable="Diabetes",
    explanatory_variables=["Drink", "Inactivity"],
    out_features=r"Diabetes_SAR",
    model_type="LAG",
    neighborhood_type="CONTIGUITY_EDGES_CORNERS",
    distance_band=None,
    number_of_neighbors=None,
    weights_matrix_file=None,
    local_weighting_scheme="UNWEIGHTED",
    kernel_bandwidth=None
)

Environments

Output Coordinate System

Licensing information

Basic: Yes
Standard: Yes
Advanced: Yes