Bivariate Spatial Association (Lee's L) (Spatial Statistics Tools)

Summary

Calculates the spatial association between two continuous variables using the Lee's L statistic.

The Lee's L statistic characterizes both the degree of correlation and the degree of copatterning (similarity of spatial clustering) between the variables. The value will be between -1 and 1 and is conceptually similar to a correlation coefficient but is adjusted to account for spatial autocorrelation of the two variables. Lee's L values close to 1 indicate that the variables are highly positively correlated and that each variable has high spatial autocorrelation (high and low values of the variables each tend to cluster together). Values close to -1 indicate that the variables are highly negatively correlated and that each variable has highly positive spatial autocorrelation. Values close to 0 indicate that the variables are uncorrelated, not spatially autocorrelated, or both.

The Lee's L statistic can be partitioned to each input feature, called local Lee's L statistics, that show the local spatial association of the feature and its neighbors. This can be used to determine areas that have higher or lower spatial association than the global Lee's L statistic. The local statistics can also be classified into one of several categories based on the values of the neighbors of each feature. Both the global and local statistics are tested for statistical significance using permutations.

Learn more about how Bivariate Spatial Association (Lee's L) works

Illustration

Usage

The two analysis variables must be continuous (not binary or categorical), and the variables should have a linear relationship. If the relationship is not linear, use the Transform Field tool to apply transformations to the analysis variables to linearize the relationship and rerun the tool with the transformed values.
The tool returns a variety of outputs that allow you to investigate the spatial association between the two analysis variables. The geoprocessing messages display the Lee's L statistic and the p-value, and the output feature class contains fields summarizing the local Lee's L statistics, p-values, and statistical significance results. When run in an active map, the output feature layer will draw based on the local spatial association categories: Not Significant, High-High, Low-Low, High-Low, and Low-High. For example, if the local Lee's L statistic is at least 90 percent statistically significant, the first analysis variable is higher than the mean value, and the second variable is lower than the mean value, the category will be High-Low.

Learn more about the outputs of the tool
The p-values for testing the global and local spatial associations for statistical significance are calculated using permutations.
Use at least 50 input features and include at least 8 neighbors for each feature.
The neighborhoods of each feature always include the feature. If a spatial weights file is used to define neighbors, a weight of 1 will be defined for the weight of a feature to itself, even if the spatial weights file does not have the weight defined. The weights of each neighborhood are row standardized so that they sum to 1.
The Random Number Generator environment can be used to reproduce the permutations and p-values. If no seed value is specified, the global and local p-values may change due to randomness. However, If the Parallel Processing Factor environment is set to a value larger than 1 (the default), the permutations will not be consistent, even with a fixed seed value of the random number generator.
Reversing the order of the two analysis variables will not change the global or local Lee's L statistics, but the p-values may change due to randomness of the permutations. The High-Low and Low-High categories will also reverse.

Label	Explanation	Data type
Input Features	The input features containing the fields of the two analysis variables.	Feature Layer
Analysis Field 1	The field of the first analysis variable. The field must be numeric.	Field
Analysis Field 2	The field of the second analysis variable. The field must be numeric.	Field
Output Features	The output features containing the local Lee's L statistics, spatial association categories, p-values, and the weighted averages of the neighbors of each feature.	Feature Class
Neighborhood Type (Optional)	Specifies how neighbors of each feature will be determined. The feature is always included in the neighborhood, and all neighborhood weights are normalized to sum to 1. Fixed distance band—Features within a specified critical distance of each feature will be included as neighbors. This is the default for point features. K nearest neighbors—The closest k features will be included as neighbors. Contiguity edges only—Polygon features that share an edge will be included as neighbors. Contiguity edges corners—Polygon features that share an edge or corner will be included as neighbors. This is the default for polygon features. Delaunay triangulation—Points whose Delaunay triangulation (Thiessen polygons) share an edge or corner will be included as neighbors. Get spatial weights from file—Neighbors and weights will be defined by a spatial weights file.	String
Distance Band (Optional)	The distance band that will be used to determine neighbors around the focal feature. If no value is provided, the distance will be the shortest distance such that each feature has at least one other neighbor in its neighborhood. For polygons, the distance between centroids will be used to determine neighbors.	Linear Unit
Number of Neighbors (Optional)	The number of neighbors around each feature that will be included as neighbors. The value does not include the feature. For example, specifying 6 will use the feature and its six closest neighbors (seven features total). The default is 8. The value must be at least 2.	Long
Weights Matrix File (Optional)	The path and file name of the spatial weights matrix file that defines the neighbors and weights between features.	File
Local Weighting Scheme (Optional)	Specifies the weighting scheme that will be applied to neighbors when calculating spatial associations. Unweighted—Neighbors will not be weighted. This is the default. Bisquare—Neighbors will be weighted using a bisquare (quartic) kernel.	String
Kernel Bandwidth (Optional)	The bandwidth for the bisquare kernel. The bandwidth defines how quickly the weights decrease with distance. Larger bandwidths will provide comparatively larger weights to neighbors that are farther away from the feature. For the k nearest neighbors neighborhood, the default value (empty) will use an adaptive bandwidth equal to the distance to the (k+1)th neighbor of the focal feature. For the fixed distance band neighborhood, the default (empty) will use the same value as the distance band.	Linear Unit
Number of Permutations (Optional)	Specifies the number of permutations that will be used to create reference distributions when calculating global and local p-values. All p-values are calculated using two-sided hypothesis tests. 99—The analysis will use 99 permutations. With 99 permutations, the smallest possible p-value is 0.02, and all other p-values will be multiples of this value. 199—The analysis will use 199 permutations. With 199 permutations, the smallest possible p-value is 0.01, and all other p-values will be multiples of this value. 499—The analysis will use 499 permutations. With 499 permutations, the smallest possible p-value is 0.004, and all other p-values will be multiples of this value. 999—The analysis will use 999 permutations. With 999 permutations, the smallest possible p-value is 0.002, and all other p-values will be multiples of this value. This option is recommended for 90 percent confidence tests. This is the default. 4999—The analysis will use 4,999 permutations. With 4,999 permutations, the smallest possible p-value is 0.0004, and all other p-values will be multiples of this value. This option is recommended for 95 percent confidence tests. 9999—The analysis will use 9,999 permutations. With 9,999 permutations, the smallest possible p-value is 0.0002, and all other p-values will be multiples of this value. This option is recommended for 99 percent confidence tests.	Long

Derived output

Label	Explanation	Data type
Lee's L	The Lee's L statistic for the analysis variables.	Double
P-value	The p-value for the Lee's L statistic.	Double
Pearson Correlation	The Pearson correlation between the analysis variables.	Double

arcpy.stats.BivariateSpatialAssociation(in_features, analysis_field1, analysis_field2, out_features, {neighborhood_type}, {distance_band}, {num_neighbors}, {weights_matrix_file}, {local_weighting_scheme}, {kernel_bandwidth}, {num_permutations})

Name	Explanation	Data type
in_features	The input features containing the fields of the two analysis variables.	Feature Layer
analysis_field1	The field of the first analysis variable. The field must be numeric.	Field
analysis_field2	The field of the second analysis variable. The field must be numeric.	Field
out_features	The output features containing the local Lee's L statistics, spatial association categories, p-values, and the weighted averages of the neighbors of each feature.	Feature Class
neighborhood_type (Optional)	Specifies how neighbors of each feature will be determined. The feature is always included in the neighborhood, and all neighborhood weights are normalized to sum to 1. `DISTANCE_BAND`—Features within a specified critical distance of each feature will be included as neighbors. This is the default for point features. `K_NEAREST_NEIGHBORS`—The closest k features will be included as neighbors. `CONTIGUITY_EDGES_ONLY`—Polygon features that share an edge will be included as neighbors. `CONTIGUITY_EDGES_CORNERS`—Polygon features that share an edge or corner will be included as neighbors. This is the default for polygon features. `DELAUNAY_TRIANGULATION`—Points whose Delaunay triangulation (Thiessen polygons) share an edge or corner will be included as neighbors. `GET_SPATIAL_WEIGHTS_FROM_FILE`—Neighbors and weights will be defined by a spatial weights file.	String
distance_band (Optional)	The distance band that will be used to determine neighbors around the focal feature. If no value is provided, the distance will be the shortest distance such that each feature has at least one other neighbor in its neighborhood. For polygons, the distance between centroids will be used to determine neighbors.	Linear Unit
num_neighbors (Optional)	The number of neighbors around each feature that will be included as neighbors. The value does not include the feature. For example, specifying 6 will use the feature and its six closest neighbors (seven features total). The default is 8. The value must be at least 2.	Long
weights_matrix_file (Optional)	The path and file name of the spatial weights matrix file that defines the neighbors and weights between features.	File
local_weighting_scheme (Optional)	Specifies the weighting scheme that will be applied to neighbors when calculating spatial associations. `UNWEIGHTED`—Neighbors will not be weighted. This is the default. `BISQUARE`—Neighbors will be weighted using a bisquare (quartic) kernel.	String
kernel_bandwidth (Optional)	The bandwidth for the bisquare kernel. The bandwidth defines how quickly the weights decrease with distance. Larger bandwidths will provide comparatively larger weights to neighbors that are farther away from the feature. For the k nearest neighbors neighborhood, the default value (empty) will use an adaptive bandwidth equal to the distance to the (k+1)th neighbor of the focal feature. For the fixed distance band neighborhood, the default (empty) will use the same value as the distance band.	Linear Unit
num_permutations (Optional)	Specifies the number of permutations that will be used to create reference distributions when calculating global and local p-values. All p-values are calculated using two-sided hypothesis tests. `99`—The analysis will use 99 permutations. With 99 permutations, the smallest possible p-value is 0.02, and all other p-values will be multiples of this value. `199`—The analysis will use 199 permutations. With 199 permutations, the smallest possible p-value is 0.01, and all other p-values will be multiples of this value. `499`—The analysis will use 499 permutations. With 499 permutations, the smallest possible p-value is 0.004, and all other p-values will be multiples of this value. `999`—The analysis will use 999 permutations. With 999 permutations, the smallest possible p-value is 0.002, and all other p-values will be multiples of this value. This option is recommended for 90 percent confidence tests. This is the default. `4999`—The analysis will use 4,999 permutations. With 4,999 permutations, the smallest possible p-value is 0.0004, and all other p-values will be multiples of this value. This option is recommended for 95 percent confidence tests. `9999`—The analysis will use 9,999 permutations. With 9,999 permutations, the smallest possible p-value is 0.0002, and all other p-values will be multiples of this value. This option is recommended for 99 percent confidence tests.	Long

Derived output

Name	Explanation	Data type
lee_l	The Lee's L statistic for the analysis variables.	Double
p_value	The p-value for the Lee's L statistic.	Double
corr	The Pearson correlation between the analysis variables.	Double

Code sample

BivariateSpatialAssociation example 1 (Python window)

The following Python window script demonstrates how to use the BivariateSpatialAssociation function.

# Calculate the Lee's L statistic using eight nearest neighbors
# and adaptive bandwidth.
arcpy.env.workspace = r"c:\data\project_data.gdb"
arcpy.stats.BivariateSpatialAssociation(
    in_features=r"myFeatureClass",
    analysis_field1="myAnalysisField1",
    analysis_field2="myAnalysisField2",
    out_features=r"myOutputFeatureClass",
    neighborhood_type="K_NEAREST_NEIGHBORS",
    distance_band=None,
    num_neighbors=8,
    weights_matrix_file=None,
    local_weighting_scheme="BISQUARE",
    kernel_bandwidth=None,
    num_permutations=9999
)

BivariateSpatialAssociation example 2 (stand-alone script)

The following stand-alone script demonstrates how to use the BivariateSpatialAssociation function.

# Calculate the Lee's L statistic for two analysis fields.

import arcpy

# Set the current workspace
arcpy.env.workspace = r"c:\data\project_data.gdb"

# Run tool

arcpy.stats.BivariateSpatialAssociation(
    in_features=r"myFeatureClass",
    analysis_field1="myAnalysisField1",
    analysis_field2="myAnalysisField2",
    out_features=r"myOutputFeatureClass",
    neighborhood_type="CONTIGUITY_EDGES_CORNERS",
    distance_band=None,
    num_neighbors=None,
    weights_matrix_file=None,
    local_weighting_scheme="UNWEIGHTED",
    kernel_bandwidth=None,
    num_permutations=9999
)

# Print the messages. The messages include the Lee's L statistic, p-value,
# Pearson correlations, and spatial smoothing scalars.

print(arcpy.GetMessages())

Environments

Geographic Transformations, Output Coordinate System, Parallel Processing Factor, Random number generator

Licensing information

Basic: Yes
Standard: Yes
Advanced: Yes

Bivariate Spatial Association (Lee's L) (Spatial Statistics Tools)

Summary

Illustration

Usage

Parameters

Derived output

Environments

Licensing information

Related topics