Select Random Sample (Data Reviewer Tools)

Summary

Selects a random sample of the input features or rows based on the specified sampling method.

The output is a selection made on the input layer in the map frame. The tool can also create a .json file that records the selected object IDs (OIDs), and the SQL expression used for the selection. The selection can be used for the Browse Features visual review tool and the Run Data Checks tool workflows.

Usage

The Sample Method parameter has the following options:
- Fixed Number—The number of records selected will be based on the Number of Records parameter value.
- Percentage—The number of records selected will be based on the Percentage of Records parameter value.
- Auto Calculate—The number of records selected will be based on a calculation using the Confidence Level and Margin of Error parameter values.
The Sample Method parameter's Auto Calculate option uses the following variables to calculate the number of records:

\[ \begin{align*} z &= \text{scipy.stats.norm.ppf}\left(1 - \frac{1 - \text{confidence\_level}}{2}\right) \\ n &= \left(\frac{z}{m}\right)^2 \cdot \left(p \cdot (1 - p)\right) \\ n' &= \frac{n \cdot N}{n + (N - 1)} \end{align*} \]
- The z-statistic for the desired confidence level (z). The z-statistic is calculated using the confidence level variable and the scipy.stats module z=scipy.stats.norm.ppf(1-(1-confidence_level)/2).
- The acceptable margin of error in the confidence interval (m).
- The probability (p) is highest at 0.5 because there is no past knowledge about whether a certain percentage of records will pass or fail. Since the chances of records passing or failing are equal, 0.5 is the most conservative value to use in the variance equation.
- The population size (N) is the total number of records in a feature layer or table.
Random OIDs are selected using the random Python module random.sample(population, k) in which population is the list of the OID values, and k is the size of the sample.
The output of this tool is a random selection of records from the Input Rows parameter value based on the Sample Method parameter value.
Use the optional Output File parameter to create a .json file that includes the following:
- The date and time the tool was run
- The workspace the input is sourced from
- The name of the input feature layers or tables
- The total number of selected records
- The OIDs of the selected records
- The SQL expression that was used to make the selection
All selections made in the Input Rows parameter will be implemented, regardless of whether the Use the selected records toggle button is turned off.
The feature layer or table must have an ObjectID field before running this tool.
If the Use the selected records toggle button is turned off, the Output File parameter value will record a random selection of features based on the entire dataset. However, if there is a definition query applied, only the features or rows matching the query will be selected in the map frame.

Label	Explanation	Data type
Input Rows	The data to which the selection will be applied.	Feature Layer; Table View
Sample Method	Specifies the sampling method that will be used. Fixed Number—The number of records selected will be based on the number of records parameter value. Percentage—The number of records selected will be based on the percentage of records parameter value. Auto Calculate—The number of records selected will be based on a calculation using the confidence level and margin of error parameter values.	String
Number of Records (Optional)	The number of records that will be selected. This parameter is active when the Sample Method parameter value is Fixed Number.	Long
Percentage of Records (Optional)	The percentage of records in the input that will be selected. This parameter is active when the Sample Method parameter value is Percentage.	Long
Confidence Level (Optional)	The level of confidence is the likelihood that a sample size is statistically significant, entered as a percentage such as 98 or 95. This parameter will be used to calculate the z-statistic (z). The z-statistic can be calculated using the `scipy.stats` module `z=scipy.stats.norm.ppf(1-(1-confidence_level)/2)`. This parameter is active when the Sample Method parameter value is Auto Calculate.	Long
Margin of Error (Optional)	The acceptable margin of error in the confidence level, entered as a percentage such as 8 or 5. This parameter uses the calculated z-statistic (z) to calculate the actual sample size (n') using the following equations: `n=((z/m)^2)(p(1-p))` to `n'=(nN)/(n+(N-1))`. This parameter is active when the Sample Method* parameter value is Auto Calculate.	Long
Output File (Optional)	The output `.json` file that will contain a record of the selected data.	File

Derived output

Label	Explanation	Data type
Updated Rows	The updated input with the selections applied.	Feature Layer; Table View

arcpy.Reviewer.SelectRandomSample(in_layer_or_view, sample_method, {number_of_records}, {percentage_of_records}, {confidence_level}, {margin_of_error}, {out_file})

Name	Explanation	Data type
in_layer_or_view	The data to which the selection will be applied.	Feature Layer; Table View
sample_method	Specifies the sampling method that will be used. `FIXED_NUMBER`—The number of records selected will be based on the number of records parameter value. `PERCENTAGE`—The number of records selected will be based on the percentage of records parameter value. `AUTO_CALCULATE`—The number of records selected will be based on a calculation using the confidence level and margin of error parameter values.	String
number_of_records (Optional)	The number of records that will be selected. This parameter is enabled when the `sample_method` parameter value is `FIXED_NUMBER`.	Long
percentage_of_records (Optional)	The percentage of records in the input that will be selected. This parameter is enabled when the `sample_method` parameter value is `PERCENTAGE`.	Long
confidence_level (Optional)	The level of confidence is the likelihood that a sample size is statistically significant, entered as a percentage such as 98 or 95. This parameter will be used to calculate the z-statistic (z). The z-statistic can be calculated using the `scipy.stats` module `z=scipy.stats.norm.ppf(1-(1-confidence_level)/2)`. This parameter is enabled when the `sample_method` parameter value is `AUTO_CALCULATE`.	Long
margin_of_error (Optional)	The acceptable margin of error in the confidence level, entered as a percentage such as 8 or 5. This parameter uses the calculated z-statistic (z) to calculate the actual sample size (n') using the following equations: `n=((z/m)^2)(p(1-p))` to `n'=(n*N)/(n+(N-1))`. This parameter is enabled when the `sample_method` parameter value is `AUTO_CALCULATE`.	Long
out_file (Optional)	The output `.json` file that will contain a record of the selected data.	File

Derived output

Name	Explanation	Data type
out_layer_or_view	The updated input with the selections applied.	Feature Layer; Table View

Code sample

SelectRandomSample example 1 (Python window)

The following Python window script demonstrates how to use the SelectRandomSample function.

import arcpy
arcpy.env.workspace = r"C:\USAData\Data.gdb"
arcpy.Reviewer.SelectRandomSample("Cities", "FIXED_NUMBER", number_of_records = 35, out_file = "C:\\USAData\\Cities_Sample.json")

SelectRandomSample example 2 (stand-alone script)

The following stand-alone script creates a random selection of features within the Cities feature layer.

# Name: SelectRandomSample_Example.py
# Description: Use the SelectRandomSample tool in ArcGIS Pro to select a random sample of features from a feature class.

# Import system modules
import arcpy

# Set environment workspace
arcpy.env.workspace = r"C:\USAData\Data.gdb"

# Set local variables
in_layer_or_view = "Cities"
sampling_method = "AUTO_CALCULATE"
confidence_level = 98
margin_of_error = 5
out_file = r"C:\USAData\Cities_Sample.json"

# Generate a random sample of features
arcpy.Reviewer.SelectRandomSample(in_layer_or_view, sampling_method, confidence_level, margin_of_error, out_file)

Environments

Current Workspace

Licensing information

Basic: Requires Data Reviewer
Standard: Requires Data Reviewer
Advanced: Requires Data Reviewer

Select Random Sample (Data Reviewer Tools)

Summary

Usage

Parameters

Derived output

Environments

Licensing information

Related topics