Generalized Linear Regression (GeoAnalytics Server Tools)

Summary

Performs generalized linear regression (GLR) to generate predictions or to model a dependent variable in terms of its relationship to a set of explanatory variables. This tool can be used to fit continuous (OLS), binary (logistic), and count (Poisson) models.

Legacy:

The ArcGIS GeoAnalytics Server extension is being deprecated in ArcGIS Enterprise. The final release of GeoAnalytics Server was included with ArcGIS Enterprise 11.3. This geoprocessing tool is available through ArcGIS Enterprise 11.3 and earlier versions.

Usage

This tool can be used in two operation modes. You can evaluate the performance of different models as you explore different explanatory variables and tool settings. Once a good model has been found, you can fit the model to a new dataset.
This tool does not support inputs with date only or time only fields.
Use the Input Features parameter with a field representing the phenomena you are modeling (the Dependent Variable parameter value) and one or more fields representing the explanatory variables.
The Generalized Linear Regression tool also produces output features and diagnostics. Output feature layers are automatically added to the map with a rendering scheme applied to model residuals. An explanation of each output is provided below.
Ensure that you use the correct Model Type parameter option (Continuous, Binary, or Count) for the analysis to obtain accurate results of the regression analysis.
Model summary results and diagnostics are written to the messages window, and charts will be created below the output feature class. The diagnostics reported depend on the Model Type parameter value. The three model type options are as follows:
- Use the Continuous (Gaussian) model type if the dependent variable can accept a wide range of values such as temperature or total sales. Ideally, the dependent variable will be normally distributed.
- Use the Binary (Logistic) model type if the dependent variable can accept one of two possible values, such as success and failure or presence and absence. The field containing the dependent variable must be either a numeric field or a text field. If the field is numeric, it should contain only ones and zeros. If the field is text, it should contain only two distinct values. If you are using a text field, you must use the Map Dependent Variables parameter to map the distinct text values to ones and zeros. There must be variation of the ones and zeros of the distinct text values in the data.
- Use the Count (Poisson) model type if the dependent variable is discrete and represents the number of occurrences of an event such as a count of crimes. Count models can also be used if the dependent variable represents a rate and the denominator of the rate is a fixed value such as sales per month or number of people with cancer per 10,000 in the population. In the Count model, it is assumed that the mean and variance of the dependent variable are equal, and the values of the dependent variable cannot be negative or contain decimals.
The Dependent Variable and Explanatory Variable parameter values should be numeric fields containing a range of values. This tool cannot solve when variables have the same values (if all the values for a field are 9.0, for example).
Features with one or more null values or empty string values in prediction or explanatory fields will be excluded from the output. You can modify values using the Calculate Field tool if necessary.
Review the over- and underpredictions evident in the regression residuals to see whether they provide information about potential missing variables from the regression model.
You can use the regression model that has been created to make predictions for other features. Creating these predictions requires that each prediction feature has values for each of the explanatory variables provided. If the field names from the input features and prediction locations parameters do not match, a variable matching the parameter is provided. When matching the explanatory variables, the fields from the Input Features and Input Prediction Features parameters must be of the same type (double fields must be matched with double fields, for example).
The GeoAnalytics implementation of GLR has the following limitations:
- It is a global regression model and does not take the spatial distribution of data into account.
- Analysis does not apply Moran's I test on the residuals.
- Feature datasets (points, lines, polygons, and tables) are supported as input; rasters are not supported.
- You cannot classify values into multiple classes.
This geoprocessing tool is powered by ArcGIS GeoAnalytics Server. Analysis is completed on GeoAnalytics Server, and results are stored in your content in ArcGIS Enterprise.
When running GeoAnalytics Server tools, the analysis is completed on GeoAnalytics Server. For optimal performance, make data available to GeoAnalytics Server through feature layers hosted on your ArcGIS Enterprise portal or through big data file shares. Data that is not local to GeoAnalytics Server will be moved to GeoAnalytics Server before analysis begins. This means that it will take longer to run a tool and, in some cases, moving the data from ArcGIS Pro to GeoAnalytics Server may fail. The threshold for failure depends on your network speeds, as well as the size and complexity of the data. It is recommended that you always share your data or create a big data file share.

Learn more about sharing data to your portal

Learn more about creating a big data file share through Server Manager

Label	Explanation	Data type
Input Features	The layer containing the dependent and independent variables.	Record Set
Dependent Variable	The numeric field containing the observed values to be modeled.	Field
Model Type	Specifies the type of data that will be modeled. Continuous (Gaussian)—The Dependent Variable value is continuous. The Gaussian model will be used, and the tool will perform ordinary least squares regression. This is the default. Binary (Logistic)—The Dependent Variable value represents presence or absence. This can be either conventional ones and zeroes, or string values mapped to zero or ones in the Match Explanatory Variables parameter. The Logistic regression model will be used. Count (Poisson)—The Dependent Variable value is discrete and represents events, for example, crime counts, disease incidents, or traffic accidents. The Poisson regression model will be used.	String
Explanatory Variable(s)	A list of fields representing independent explanatory variables in the regression model.	Field
Output Features Name	The name of the feature class that will be created containing the dependent variable estimates and residuals.	String
Generate Coefficient Table (Optional)	Specifies whether an output table with coefficient (Boolean) values will be generated. Checked—A table with coefficient values will be generated. Unchecked—A table with coefficient values will not be generated. This is the default.	Boolean
Input Prediction Features (Optional)	A layer containing features representing locations where estimates will be computed. Each feature in this dataset should contain values for all the explanatory variables specified. The dependent variable for these features will be estimated using the model calibrated for the input layer data.	Record Set
Match Explanatory Variables (Optional)	Matches the explanatory variables in the Input Prediction Features parameter to corresponding explanatory variables from the Input Features parameter. Value table columns: Predict Value—A list of variables from the prediction locations that correspond to the explanatory variables of the input features that will be made to make predictions. Input Value—The list of variables from the input features that were used to build the GLR model.	Value Table
Map Dependent Variables (Optional)	Two strings representing the values used to map to 0 (absence) and 1 (presence) for binary regression. By default, 0 and 1 will be used. For example, to predict an arrest with field values of Arrest and No Arrest, enter No Arrest for `False Value (0)` and Arrest for `True Value (1)`. Value table columns: False Value (0)—A value used to represent absence (0) in binary regression. True Value (1)—A value used to represent presence (1) in binary regression.	Value Table
Data Store (Optional)	Specifies the ArcGIS Data Store where the output will be stored. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system. Spatiotemporal big data store—Output will be stored in a spatiotemporal big data store. This is the default. Relational data store—Output will be stored in a relational data store.	String

Derived output

Label	Explanation	Data type
Output	The output feature service containing the dependent variable estimates for each input feature.	Record Set
Output Predicted Features	An output layer containing the input variables and predicted explanatory values.	Record Set
Output Table of Coefficients	An output table containing the coefficients from the model fit. The output is created when the Generate Coefficient Table parameter is checked.	Record Set

arcpy.geoanalytics.GeneralizedLinearRegression(input_features, dependent_variable, model_type, explanatory_variables, output_features_name, {generate_coefficient_table}, {input_features_to_predict}, {explanatory_variables_to_match}, {dependent_variable_mapping}, {data_store})

Name	Explanation	Data type
input_features	The layer containing the dependent and independent variables.	Record Set
dependent_variable	The numeric field containing the observed values to be modeled.	Field
model_type	Specifies the type of data that will be modeled. `CONTINUOUS`—The `dependent_variable` value is continuous. The Gaussian model will be used, and the tool will perform ordinary least squares regression. This is the default. `BINARY`—The `dependent_variable` value represents presence or absence. This can be either conventional ones and zeroes, or string values mapped to zero or ones in the Match Explanatory Variables parameter. The Logistic regression model will be used. `COUNT`—The `dependent_variable` value is discrete and represents events, for example, crime counts, disease incidents, or traffic accidents. The Poisson regression model will be used.	String
explanatory_variables [explanatory_variables,...]	A list of fields representing independent explanatory variables in the regression model.	Field
output_features_name	The name of the feature class that will be created containing the dependent variable estimates and residuals.	String
generate_coefficient_table (Optional)	Specifies whether an output table with coefficient (Boolean) values will be generated. `CREATE_TABLE`—A table with coefficient values will be generated. `NO_TABLE`—A table with coefficient values will not be generated. This is the default.	Boolean
input_features_to_predict (Optional)	A layer containing features representing locations where estimates will be computed. Each feature in this dataset should contain values for all the explanatory variables specified. The dependent variable for these features will be estimated using the model calibrated for the input layer data.	Record Set
explanatory_variables_to_match [[Field from Prediction Locations, Field from Input Features],...] (Optional)	Matches the explanatory variables in the `input_features_to_predict` parameter to corresponding explanatory variables from the `input_features` parameter—for example, `[["LandCover2000", "LandCover2010"], ["Income", "PerCapitaIncome"]]`. Value table columns: `Predict Value`—A list of variables from the prediction locations that correspond to the explanatory variables of the input features that will be made to make predictions. `Input Value`—The list of variables from the input features that were used to build the GLR model.	Value Table
dependent_variable_mapping [dependent_variable_mapping,...] (Optional)	Two strings representing the values used to map to 0 (absence) and 1 (presence) for binary regression. By default, 0 and 1 will be used. For example, to predict an arrest with field values of Arrest and No Arrest, enter No Arrest for False Value (0) and Arrest for True Value (1). Value table columns: `False Value (0)`—A value used to represent absence (0) in binary regression. `True Value (1)`—A value used to represent presence (1) in binary regression.	Value Table
data_store (Optional)	Specifies the ArcGIS Data Store where the output will be stored. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system. `SPATIOTEMPORAL_DATA_STORE`—Output will be stored in a spatiotemporal big data store. This is the default. `RELATIONAL_DATA_STORE`—Output will be stored in a relational data store.	String

Derived output

Name	Explanation	Data type
output	The output feature service containing the dependent variable estimates for each input feature.	Record Set
output_predicted_features	An output layer containing the input variables and predicted explanatory values.	Record Set
coefficient_table	An output table containing the coefficients from the model fit. The output is created when the `generate_coefficient_table` parameter is set to `CREATE_TABLE`.	Record Set

Code sample

GeneralizedLinearRegression example (stand-alone script)

The following stand-alone script demonstrates how to use the GeneralizedLinearRegression function.

In this script, you create a model and predict if an arrest was made for given crimes.

# Description: Run GLR on crime data and predict if an arrest was made for a crime reporting.
#
# Requirements: ArcGIS GeoAnalytics Server

# Import system modules
import arcpy

# Set local variables
trainingDataset = "https://analysis.org.com/server/rest/services/Hosted/old_crimes/FeatureServer/0"
predictionDataset = "https://analysis.org.com/server/rest/services/Hosted/new_crimes/FeatureServer/0"
outputTrainingName = "training"

# Run GLR
arcpy.geoanalytics.GeneralizedLinearRegression(
    trainingDataset, "ArrestMade", "BINARY", ["CRIME_TYPE", "WARD", "DAY_OF_MONTH"], outputTrainingName,
    "NO_TABLE", predictionDataset, [["CRIME_TYPE", "CRIME_TYPE"], ["WARD", "WARD"], ["DAY_OF_MONTH", "DAY_OF_MON"]],
    [["Arrest", "NoArrest"]], "SPATIOTEMPORAL_DATA_STORE")

Environments

Output Coordinate System, Extent, Current Workspace

Special cases

Output Coordinate System: The coordinate system that will be used for analysis. Analysis will be completed in the input coordinate system unless specified by this parameter. For GeoAnalytics Tools, final results will be stored in the spatiotemporal data store in WGS84.

Licensing information

Basic: Requires ArcGIS GeoAnalytics Server
Available with ArcGIS Enterprise 10.7
Standard: Requires ArcGIS GeoAnalytics Server
Available with ArcGIS Enterprise 10.7
Advanced: Requires ArcGIS GeoAnalytics Server
Available with ArcGIS Enterprise 10.7

Generalized Linear Regression (GeoAnalytics Server Tools)

Summary

Legacy:

Usage

Parameters

Derived output

Environments

Special cases

Licensing information

Related topics