Skip to main content

Generalized Linear Regression (GeoAnalytics Server Tools)

Summary

Performs generalized linear regression (GLR) to generate predictions or to model a dependent variable in terms of its relationship to a set of explanatory variables. This tool can be used to fit continuous (OLS), binary (logistic), and count (Poisson) models.

Legacy:

The ArcGIS GeoAnalytics Server extension is being deprecated in ArcGIS Enterprise. The final release of GeoAnalytics Server was included with ArcGIS Enterprise 11.3. This geoprocessing tool is available through ArcGIS Enterprise 11.3 and earlier versions.

Usage

  • This tool can be used in two operation modes. You can evaluate the performance of different models as you explore different explanatory variables and tool settings. Once a good model has been found, you can fit the model to a new dataset.

  • This tool does not support inputs with date only or time only fields.

  • Use the Input Features parameter with a field representing the phenomena you are modeling (the Dependent Variable parameter value) and one or more fields representing the explanatory variables.

  • The Generalized Linear Regression tool also produces output features and diagnostics. Output feature layers are automatically added to the map with a rendering scheme applied to model residuals. An explanation of each output is provided below.

  • Ensure that you use the correct Model Type parameter option (Continuous, Binary, or Count) for the analysis to obtain accurate results of the regression analysis.

  • Model summary results and diagnostics are written to the messages window, and charts will be created below the output feature class. The diagnostics reported depend on the Model Type parameter value. The three model type options are as follows:

    • Use the Continuous (Gaussian) model type if the dependent variable can accept a wide range of values such as temperature or total sales. Ideally, the dependent variable will be normally distributed.

    • Use the Binary (Logistic) model type if the dependent variable can accept one of two possible values, such as success and failure or presence and absence. The field containing the dependent variable must be either a numeric field or a text field. If the field is numeric, it should contain only ones and zeros. If the field is text, it should contain only two distinct values. If you are using a text field, you must use the Map Dependent Variables parameter to map the distinct text values to ones and zeros. There must be variation of the ones and zeros of the distinct text values in the data.

    • Use the Count (Poisson) model type if the dependent variable is discrete and represents the number of occurrences of an event such as a count of crimes. Count models can also be used if the dependent variable represents a rate and the denominator of the rate is a fixed value such as sales per month or number of people with cancer per 10,000 in the population. In the Count model, it is assumed that the mean and variance of the dependent variable are equal, and the values of the dependent variable cannot be negative or contain decimals.

    The Dependent Variable and Explanatory Variable parameter values should be numeric fields containing a range of values. This tool cannot solve when variables have the same values (if all the values for a field are 9.0, for example).

  • Features with one or more null values or empty string values in prediction or explanatory fields will be excluded from the output. You can modify values using the Calculate Field tool if necessary.

  • Review the over- and underpredictions evident in the regression residuals to see whether they provide information about potential missing variables from the regression model.

  • You can use the regression model that has been created to make predictions for other features. Creating these predictions requires that each prediction feature has values for each of the explanatory variables provided. If the field names from the input features and prediction locations parameters do not match, a variable matching the parameter is provided. When matching the explanatory variables, the fields from the Input Features and Input Prediction Features parameters must be of the same type (double fields must be matched with double fields, for example).

  • The GeoAnalytics implementation of GLR has the following limitations:

    • It is a global regression model and does not take the spatial distribution of data into account.

    • Analysis does not apply Moran's I test on the residuals.

    • Feature datasets (points, lines, polygons, and tables) are supported as input; rasters are not supported.

    • You cannot classify values into multiple classes.

  • This geoprocessing tool is powered by ArcGIS GeoAnalytics Server. Analysis is completed on GeoAnalytics Server, and results are stored in your content in ArcGIS Enterprise.

  • When running GeoAnalytics Server tools, the analysis is completed on GeoAnalytics Server. For optimal performance, make data available to GeoAnalytics Server through feature layers hosted on your ArcGIS Enterprise portal or through big data file shares. Data that is not local to GeoAnalytics Server will be moved to GeoAnalytics Server before analysis begins. This means that it will take longer to run a tool and, in some cases, moving the data from ArcGIS Pro to GeoAnalytics Server may fail. The threshold for failure depends on your network speeds, as well as the size and complexity of the data. It is recommended that you always share your data or create a big data file share.

    Learn more about sharing data to your portal

    Learn more about creating a big data file share through Server Manager

Parameters

Label Explanation Data type

Input Features

The layer containing the dependent and independent variables.

Record Set

Dependent Variable

The numeric field containing the observed values to be modeled.

Field

Model Type

Specifies the type of data that will be modeled.

  • Continuous (Gaussian)The Dependent Variable value is continuous. The Gaussian model will be used, and the tool will perform ordinary least squares regression. This is the default.

  • Binary (Logistic)The Dependent Variable value represents presence or absence. This can be either conventional ones and zeroes, or string values mapped to zero or ones in the Match Explanatory Variables parameter. The Logistic regression model will be used.

  • Count (Poisson)The Dependent Variable value is discrete and represents events, for example, crime counts, disease incidents, or traffic accidents. The Poisson regression model will be used.

String

Explanatory Variable(s)

A list of fields representing independent explanatory variables in the regression model.

Field

Output Features Name

The name of the feature class that will be created containing the dependent variable estimates and residuals.

String

Generate Coefficient Table

(Optional)

Specifies whether an output table with coefficient (Boolean) values will be generated.

  • CheckedA table with coefficient values will be generated.

  • UncheckedA table with coefficient values will not be generated. This is the default.

Boolean

Input Prediction Features

(Optional)

A layer containing features representing locations where estimates will be computed. Each feature in this dataset should contain values for all the explanatory variables specified. The dependent variable for these features will be estimated using the model calibrated for the input layer data.

Record Set

Match Explanatory Variables

(Optional)

Matches the explanatory variables in the Input Prediction Features parameter to corresponding explanatory variables from the Input Features parameter.

Value table columns:

  • Predict ValueA list of variables from the prediction locations that correspond to the explanatory variables of the input features that will be made to make predictions.

  • Input ValueThe list of variables from the input features that were used to build the GLR model.

Value Table

Map Dependent Variables

(Optional)

Two strings representing the values used to map to 0 (absence) and 1 (presence) for binary regression. By default, 0 and 1 will be used. For example, to predict an arrest with field values of Arrest and No Arrest, enter No Arrest for False Value (0) and Arrest for True Value (1).

Value table columns:

  • False Value (0)A value used to represent absence (0) in binary regression.

  • True Value (1)A value used to represent presence (1) in binary regression.

Value Table

Data Store

(Optional)

Specifies the ArcGIS Data Store where the output will be stored. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • Spatiotemporal big data storeOutput will be stored in a spatiotemporal big data store. This is the default.

  • Relational data storeOutput will be stored in a relational data store.

String

Derived output

Label Explanation Data type

Output

The output feature service containing the dependent variable estimates for each input feature.

Record Set

Output Predicted Features

An output layer containing the input variables and predicted explanatory values.

Record Set

Output Table of Coefficients

An output table containing the coefficients from the model fit. The output is created when the Generate Coefficient Table parameter is checked.

Record Set

Environments

Output Coordinate System, Extent, Current Workspace

Special cases

Output Coordinate System

The coordinate system that will be used for analysis. Analysis will be completed in the input coordinate system unless specified by this parameter. For GeoAnalytics Tools, final results will be stored in the spatiotemporal data store in WGS84.

Licensing information

  • Basic: Requires ArcGIS GeoAnalytics Server
    Available with ArcGIS Enterprise 10.7
  • Standard: Requires ArcGIS GeoAnalytics Server
    Available with ArcGIS Enterprise 10.7
  • Advanced: Requires ArcGIS GeoAnalytics Server
    Available with ArcGIS Enterprise 10.7