Detect Objects Using Deep Learning (Image Analyst Tools)

Summary

Runs a trained deep learning model on an input raster to produce a feature class containing the objects it finds. The features can be bounding boxes or polygons around the objects found or points at the centers of the objects.

This tool requires a model definition file containing trained model information. The model can be trained using the Train Deep Learning Model tool or by a third-party training software such as PyTorch. The model definition file can be an Esri model definition JSON file (.emd) or a deep learning model package, and it must contain the path to the Python raster function to be called to process each object and the path to the trained binary deep learning model file.

Usage

You must install the proper deep learning framework Python API (such as PyTorch) in the ArcGIS Pro Python environment; otherwise, an error will occur when you add the Esri model definition file to the tool. Obtain the appropriate framework information from the creator of the Esri model definition file.

To set up your machine to use deep learning frameworks in ArcGIS Pro, see Install deep learning frameworks for ArcGIS.
This tool calls a third-party deep learning Python API (such as PyTorch) and uses the specified Python raster function to process each object.
Sample use cases for this tool are available on the Esri Python raster function GitHub page. You can also write custom Python modules by following examples and instructions in the GitHub repository.
The Model Definition parameter value can be an Esri model definition JSON file (.emd), a JSON string, or a deep learning model package (.dlpk). A JSON string is useful when this tool is used on the server so you can paste the JSON string rather than upload the .emd file. The .dlpk file must be stored locally.
The tool can process input imagery that is in map space or in pixel space. Imagery in map space is in a map-based coordinate system. Imagery in pixel space is based on rows and columns with no rotation and no distortion. The reference system can be specified when generating the training data in the Export Training Data For Deep Learning tool using the Reference System parameter. If the model is trained in a third-party training software, the reference system must be specified in the .emd file using the ImageSpaceUsed parameter, which can be set to MAP_SPACE or PIXEL_SPACE.
For oriented imagery layers, the processing will always occur in pixel space. When pixel space is used for processing, the pixel space detections are preserved in the output table in the IShape field.
Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size. The batch_size value can be adjusted using the Arguments parameter.
Batch sizes are square numbers, such as 1, 4, 9, 16, 25, 64 and so on. If the input value is not a perfect square, the highest possible square value is used. For example, if a value of 6 is specified, the batch size is set to 4.
Use the Non Maximum Suppression parameter to identify and remove duplicate features from the object detection. To learn more about this parameter, see the Usage section of the Non Maximum Suppression tool. When the inputs are oriented imagery layers, the duplicates are retained with null ground geometries.
Use the Process candidate items only option for the Processing Mode parameter to only detect objects on select images in the mosaic dataset. You can use the Compute Mosaic Candidates tool to find the image candidates in a mosaic dataset and image service that best represent the mosaic area.
This tool supports and uses multiple GPUs if available. To use a specific GPU, specify the GPU ID environment. When the GPU ID is not set, the tool uses all available GPUs. This is the default.

When the Processor Type environment is set to CPU, and the Parallel Processing Factor environment is unspecified, the tool will use a Parallel Processing Factor value of 50%.

The input raster can be a single raster, multiple rasters in a mosaic dataset, an oriented imagery layer or dataset, an image service, a folder of images, or a feature class with images attached. For more information about attachments, see Add or remove file attachments.
For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.
For more information about deep learning, see Deep learning using the ArcGIS Image Analyst extension.

Label	Explanation	Data type
Input Raster	The input image that will be used to detect objects. The input can be a single raster, multiple rasters in a mosaic dataset, an image service, a folder of images, a feature class with image attachments, or an oriented imagery dataset or layer.	Raster Dataset; Raster Layer; Mosaic Layer; Image Service; Map Server; Map Server Layer; Internet Tiled Layer; Folder; Feature Layer; Feature Class; Oriented Imagery Layer
Output Detected Objects	The output feature class that will contain geometries circling the object or objects detected in the input image. If the feature class already exists, the results will be appended to the existing feature class.	Feature Class
Model Definition	This parameter can be an Esri model definition JSON file (`.emd`), a JSON string, or a deep learning model package (`.dlpk`). A JSON string is useful when this tool is used on the server so you can paste the JSON string rather than upload the `.emd` file. The `.dlpk` file must be stored locally. It contains the path to the deep learning binary model file, the path to the Python raster function to be used, and other parameters such as preferred tile size or padding.	File; String
Arguments (Optional)	The information from the Model Definition parameter will be used to populate this parameter. These arguments vary, depending on the model architecture. The following are supported model arguments for models trained in ArcGIS. ArcGIS pretrained models and custom deep learning models may have additional arguments that the tool supports. Padding—The number of pixels at the border of image tiles from which predictions will be blended for adjacent tiles. To smooth the output while reducing artifacts, increase the value. The maximum value of the padding can be half the tile size value. The argument is available for all model architectures. Confidence Threshold—The detections that have a confidence score higher than this threshold will be included in the result. The allowed values range from 0 to 1.0. The argument is available for all model architectures. Batch Size—The number of image tiles that will be processed in each step of the model inference. This depends on the memory of your graphics card. The argument is available for all model architectures. NMS Overlap—The maximum overlap ratio for two overlapping features, which is defined as the ratio of intersection area over union area. The default is 0.1. The argument is available for all model architectures. Exclude Padding Detections—If true, potentially truncated detections near the edges that are in the padded region of image chips will be filtered. The argument is available for SSD, RetinaNet, YOLOv3, DETReg, MMDetection, and Faster RCNN only. TTA Scales—Performs test time augmentation using different scales. Each scale means the pixel block will be processed with the scales provided. The default scale is 1, which means no scaling occurs. The different scales are separated by commas, for example, 0.9,1,1.1. In this case, the pixel block will be processed three times, first at the scale of 0.9, then with no scale change, and then with a scale of 1.1. Test Time Augmentation—Performs test time augmentation while predicting. If true, predictions of flipped and rotated orientations of the input image will be merged into the final output and their confidence values will be averaged. This may cause the confidence values to fall below the threshold for objects that are only detected in a few orientations (of the image). The argument is available for all model architectures. Tile Size—The width and height of image tiles into which the imagery will be split for prediction. The argument is only available for MaskRCNN. Merge Policy—The policy for merging augmented predictions. Available options are mean, max, and min. This is only applicable when test time augmentation is used. The argument is only available for MaskRCNN. Output Classified Raster—The path to the output raster. The argument is only available for MaXDeepLab. Value table columns: Name—The name of the function argument. Value—The value of the function argument.	Value Table
Non Maximum Suppression (Optional)	Specifies whether nonmaximum suppression will be performed in which duplicate objects are identified and the duplicate features with lower confidence value are removed. Checked—Nonmaximum suppression will be performed and duplicate objects that are detected will be removed. When the inputs are oriented imagery layers, the duplicates are retained with null ground geometries. Unchecked—Nonmaximum suppression will not be performed. All objects that are detected will be in the output feature class. This is the default.	Boolean
Confidence Score Field (Optional)	The name of the field in the feature class that will contain the confidence scores as output by the object detection method. This parameter is required when the Non Maximum Suppression parameter is checked.	String
Class Value Field (Optional)	The name of the class value field in the input feature class. If no field name is provided, a `Classvalue` or `Value` field will be used. If these fields do not exist, all records will be identified as belonging to one class.	String
Max Overlap Ratio (Optional)	The maximum overlap ratio for two overlapping features, which is defined as the ratio of intersection area over union area. The default is 0.	Double
Processing Mode (Optional)	Specifies how all raster items in a mosaic dataset or an image service will be processed. This parameter is applied when the input raster is a mosaic dataset or an image service. Process as mosaicked image—All raster items in the mosaic dataset or image service will be mosaicked together and processed. This is the default. Process all raster items separately—All raster items in the mosaic dataset or image service will be processed as separate images. Process candidate items only—Only raster items with a value of 1 or 2 in the `Candidate` field of the input mosaic dataset's attribute table will be processed.	String
Use pixel space (Optional)	Specifies whether inferencing will be performed on images in pixel space. Checked—Inferencing will be performed in pixel space, and the output will be transformed back to map space. This option is useful when using oblique imagery or street-view imagery, which may cause the features to become distorted using map space. Unchecked—Inferencing will be performed in map space. This is the default.	Boolean
Objects of Interest (Optional)	Specifies the object names that will be detected by the tool. The available options will be based on the Model Definition parameter value. This parameter is only active when the model detects more than one type of object.	String

Derived output

Label	Explanation	Data type
Output Classified Raster	The output classified raster for pixel classification. The name of the raster dataset will be the same as the Output Detected Objects parameter value. This parameter is only applicable when the model type is Panoptic Segmentation.	Raster Dataset

Label

Explanation

Data type

Output Classified Raster

The output classified raster for pixel classification. The name of the raster dataset will be the same as the Output Detected Objects parameter value.

This parameter is only applicable when the model type is Panoptic Segmentation.

Raster Dataset

DetectObjectsUsingDeepLearning(in_raster, out_detected_objects, in_model_definition, {arguments}, {run_nms}, {confidence_score_field}, {class_value_field}, {max_overlap_ratio}, {processing_mode}, {use_pixelspace}, {in_objects_of_interest})

Name	Explanation	Data type
in_raster	The input image that will be used to detect objects. The input can be a single raster, multiple rasters in a mosaic dataset, an image service, a folder of images, a feature class with image attachments, or an oriented imagery dataset or layer.	Raster Dataset; Raster Layer; Mosaic Layer; Image Service; Map Server; Map Server Layer; Internet Tiled Layer; Folder; Feature Layer; Feature Class; Oriented Imagery Layer
out_detected_objects	The output feature class that will contain geometries circling the object or objects detected in the input image. If the feature class already exists, the results will be appended to the existing feature class.	Feature Class
in_model_definition	The `in_model_definition` parameter value can be an Esri model definition JSON file (`.emd`), a JSON string, or a deep learning model package (`.dlpk`). A JSON string is useful when this tool is used on the server so you can paste the JSON string rather than upload the `.emd` file. The `.dlpk` file must be stored locally. It contains the path to the deep learning binary model file, the path to the Python raster function to be used, and other parameters such as preferred tile size or padding.	File; String
arguments [arguments,...] (Optional)	The information from the `in_model_definition` parameter will be used to set the default values for this parameter. These arguments vary, depending on the model architecture. The following are supported model arguments for models trained in ArcGIS. ArcGIS pretrained models and custom deep learning models may have additional arguments that the tool supports. `padding`—The number of pixels at the border of image tiles from which predictions will be blended for adjacent tiles. To smooth the output while reducing artifacts, increase the value. The maximum value of the padding can be half the tile size value. The argument is available for all model architectures. `threshold`—The detections that have a confidence score higher than this threshold will be included in the result. The allowed values range from 0 to 1.0. The argument is available for all model architectures. `batch_size`—The number of image tiles that will be processed in each step of the model inference. This depends on the memory of your graphics card. The argument is available for all model architectures. `nms_overlap`—The maximum overlap ratio for two overlapping features, which is defined as the ratio of intersection area over union area. The default is 0.1. The argument is available for all model architectures. `exclude_pad_detections`—If true, potentially truncated detections near the edges that are in the padded region of image chips will be filtered. The argument is available for SSD, RetinaNet, YOLOv3, DETReg, MMDetection, and Faster RCNN only. `tta_scales`—Performs test time augmentation using different scales. Each scale means the pixel block will be processed with the scales provided. The default scale is 1, which means no scaling occurs. The different scales are separated by commas, for example, 0.9,1,1.1. In this case, the pixel block will be processed three times, first at the scale of 0.9, then with no scale change, and then with a scale of 1.1. `test_time_augmentation`—Performs test time augmentation while predicting. If true, predictions of flipped and rotated orientations of the input image will be merged into the final output and their confidence values will be averaged. This may cause the confidence values to fall below the threshold for objects that are only detected in a few orientations (of the image). The argument is available for all model architectures. `tile_size`—The width and height of image tiles into which the imagery will be split for prediction. The argument is only available for MaskRCNN. `merge_policy`—The policy for merging augmented predictions. Available options are mean, max, and min. This is only applicable when test time augmentation is used. The argument is only available for MaskRCNN. `output_classified_raster`—The path to the output raster. The argument is only available for MaXDeepLab. Value table columns: `Name`—The name of the function argument. `Value`—The value of the function argument.	Value Table
run_nms (Optional)	Specifies whether nonmaximum suppression will be performed in which duplicate objects are identified and the duplicate features with lower confidence value are removed. `NMS`—Nonmaximum suppression will be performed and duplicate objects that are detected will be removed. When the inputs are oriented imagery layers, the duplicates are retained with null ground geometries. `NO_NMS`—Nonmaximum suppression will not be performed. All objects that are detected will be in the output feature class. This is the default.	Boolean
confidence_score_field (Optional)	The name of the field in the feature class that will contain the confidence scores as output by the object detection method. This parameter is required when the `run_nms` parameter is set to `NMS`.	String
class_value_field (Optional)	The name of the class value field in the input feature class. If no field name is provided, a `Classvalue` or `Value` field will be used. If these fields do not exist, all records will be identified as belonging to one class.	String
max_overlap_ratio (Optional)	The maximum overlap ratio for two overlapping features, which is defined as the ratio of intersection area over union area. The default is 0.	Double
processing_mode (Optional)	Specifies how all raster items in a mosaic dataset or an image service will be processed. This parameter is applied when the input raster is a mosaic dataset or an image service. `PROCESS_AS_MOSAICKED_IMAGE`—All raster items in the mosaic dataset or image service will be mosaicked together and processed. This is the default. `PROCESS_ITEMS_SEPARATELY`—All raster items in the mosaic dataset or image service will be processed as separate images. `PROCESS_CANDIDATE_ITEMS_ONLY`—Only raster items with a value of 1 or 2 in the `Candidate` field of the input mosaic dataset's attribute table will be processed.	String
use_pixelspace (Optional)	Specifies whether inferencing will be performed on images in pixel space. `PIXELSPACE`—Inferencing will be performed in pixel space, and the output will be transformed back to map space. This option is useful when using oblique imagery or street-view imagery, which may cause the features to become distorted using map space. `NO_PIXELSPACE`—Inferencing will be performed in map space. This is the default.	Boolean
in_objects_of_interest [in_objects_of_interest,...] (Optional)	Specifies the objects that will be detected by the tool. The available options will be based on the `in_model_definition` parameter value. This parameter is only active when the model detects more than one type of object.	String

Derived output

Name Explanation Data type

Name	Explanation	Data type
out_classified_raster	The output classified raster for pixel classification. The name of the raster dataset will be the same as the `out_detected_objects` parameter value. This parameter is only applicable when the model type is Panoptic Segmentation.	Raster Dataset

out_classified_raster

The output classified raster for pixel classification. The name of the raster dataset will be the same as the out_detected_objects parameter value.

This parameter is only applicable when the model type is Panoptic Segmentation.

Raster Dataset

Code sample

DetectObjectsUsingDeepLearning example 1 (Python window)

This example creates a feature class based on object detection.

# Import system modules
import arcpy
from arcpy.ia import *

DetectObjectsUsingDeepLearning("c:/detectobjects/moncton_seg.tif",
     "c:/detectobjects/moncton_seg.shp", "c:/detectobjects/moncton.emd",
     "padding 0; threshold 0.5; batch_size 4", "NO_NMS", "Confidence",
     "Class", 0, "PROCESS_AS_MOSAICKED_IMAGE")

DetectObjectsUsingDeepLearning example 2 (stand-alone script)

This example creates a feature class based on object detection.

# Import system modules
import arcpy
from arcpy.ia import *

"""
Usage: DetectObjectsUsingDeepLearning( in_raster, out_detected_objects,
       in_model_definition, {arguments}, {run_nms}, {confidence_score_field},
       {class_value_field}, {max_overlap_ratio}, {processing_mode})
"""

# Set local variables
in_raster = "c:/classifydata/moncton_seg.tif"
out_detected_objects = "c:/detectobjects/moncton.shp"
in_model_definition = "c:/detectobjects/moncton_sig.emd"
model_arguments = "padding 0; threshold 0.5; batch_size 4"
run_nms = "NO_NMS"
confidence_score_field = "Confidence"
class_value_field = "Class"
max_overlap_ratio = 0
processing_mode = "PROCESS_AS_MOSAICKED_IMAGE"

# Run
DetectObjectsUsingDeepLearning( in_raster, out_detected_objects,
   in_model_definition, model_arguments, run_nms, confidence_score_field,
   class_value_field, max_overlap_ratio, processing_mode)

Environments

Cell Size, Current Workspace, Extent, Geographic Transformations, GPU ID, Mask, Output Coordinate System, Parallel Processing Factor, Processor Type, Scratch Workspace

Licensing information

Basic: Requires Image Analyst
Standard: Requires Image Analyst
Advanced: Requires Image Analyst

Detect Objects Using Deep Learning (Image Analyst Tools)

Summary

Usage

Parameters

Derived output

Environments

Licensing information

Related topics