Select Random Sample (Data Reviewer Tools)
Summary
Selects a random sample of the input features or rows based on the specified sampling method.
The output is a selection made on the input layer in the map frame. The tool can also create a .json file that records the selected object IDs (OIDs), and the SQL expression used for the selection. The selection can be used for the Browse Features visual review tool and the Run Data Checks tool workflows.
Usage
The Sample Method parameter has the following options:
Fixed Number—The number of records selected will be based on the Number of Records parameter value.
Percentage—The number of records selected will be based on the Percentage of Records parameter value.
Auto Calculate—The number of records selected will be based on a calculation using the Confidence Level and Margin of Error parameter values.
The Sample Method parameter's Auto Calculate option uses the following variables to calculate the number of records:
\[ \begin{align*} z &= \text{scipy.stats.norm.ppf}\left(1 - \frac{1 - \text{confidence\_level}}{2}\right) \\ n &= \left(\frac{z}{m}\right)^2 \cdot \left(p \cdot (1 - p)\right) \\ n' &= \frac{n \cdot N}{n + (N - 1)} \end{align*} \]The z-statistic for the desired confidence level (z). The z-statistic is calculated using the confidence level variable and the
scipy.statsmodulez=scipy.stats.norm.ppf(1-(1-confidence_level)/2).The acceptable margin of error in the confidence interval (m).
The probability (p) is highest at 0.5 because there is no past knowledge about whether a certain percentage of records will pass or fail. Since the chances of records passing or failing are equal, 0.5 is the most conservative value to use in the variance equation.
The population size (N) is the total number of records in a feature layer or table.
Random OIDs are selected using the
randomPython modulerandom.sample(population, k)in whichpopulationis the list of the OID values, andkis the size of the sample.The output of this tool is a random selection of records from the Input Rows parameter value based on the Sample Method parameter value.
Use the optional Output File parameter to create a
.jsonfile that includes the following:The date and time the tool was run
The workspace the input is sourced from
The name of the input feature layers or tables
The total number of selected records
The OIDs of the selected records
The SQL expression that was used to make the selection
All selections made in the Input Rows parameter will be implemented, regardless of whether the Use the selected records toggle button is turned off.
The feature layer or table must have an
ObjectIDfield before running this tool.If the Use the selected records toggle button is turned off, the Output File parameter value will record a random selection of features based on the entire dataset. However, if there is a definition query applied, only the features or rows matching the query will be selected in the map frame.
Parameters
| Label | Explanation | Data type |
|---|---|---|
|
Input Rows |
The data to which the selection will be applied. |
Feature Layer; Table View |
|
Sample Method |
Specifies the sampling method that will be used.
|
String |
|
Number of Records (Optional) |
The number of records that will be selected. This parameter is active when the Sample Method parameter value is Fixed Number. |
Long |
|
Percentage of Records (Optional) |
The percentage of records in the input that will be selected. This parameter is active when the Sample Method parameter value is Percentage. |
Long |
|
Confidence Level (Optional) |
The level of confidence is the likelihood that a sample size is statistically significant, entered as a percentage such as 98 or 95. This parameter will be used to calculate the z-statistic (z). The z-statistic can be calculated using the This parameter is active when the Sample Method parameter value is Auto Calculate. |
Long |
|
Margin of Error (Optional) |
The acceptable margin of error in the confidence level, entered as a percentage such as 8 or 5. This parameter uses the calculated z-statistic (z) to calculate the actual sample size (n') using the following equations: This parameter is active when the Sample Method parameter value is Auto Calculate. |
Long |
|
Output File (Optional) |
The output |
File |
Derived output
| Label | Explanation | Data type |
|---|---|---|
|
Updated Rows |
The updated input with the selections applied. |
Feature Layer; Table View |
Environments
Licensing information
- Basic: Requires Data Reviewer
- Standard: Requires Data Reviewer
- Advanced: Requires Data Reviewer