Skip to main content

FAQs about using a Parquet file in ArcGIS Pro

The following questions and answers provide detailed information about using an Apache Parquet file from a local folder connection or cloud storage connection in ArcGIS Pro.

Caches

Cloud storage

Data types

Mapping

Sharing

How big are the local persistent caches that are created for a Parquet file I use in ArcGIS Pro?

Because Parquet is a highly compressed storage format, the local persistent cache files that ArcGIS Pro creates are typically much larger than the original file.

For example, a Parquet file containing 1 million point records stored in a 20 MB Parquet file may result in a cache size of 250 MB. The difference in size depends on the data contained in the Parquet file, such as the number of columns and the data and entity types.

The size difference between the file and the cache are not linear.

Can I clear the local caches?

In-memory caches are cleared when you close ArcGIS Pro.

For persistent caches, you can delete all local persistent caches at any time. You can also configure automatic cache deletion. See Manage Parquet data caches for instructions.

Tip:

You can delete the local persistent caches and use the Create Parquet Cache geoprocessing tool or the CreateParquetCache ArcPy function to re-create only those that you still need.

Which cloud provider can I use to host the Parquet files I access individually to add to a map or scene?

You can create a cloud storage connection to Amazon Simple Storage Service (S3), Google Cloud Storage, Microsoft Azure Blob Storage, or Microsoft Azure Data Lake Storage Gen2.

What type of credentials can I use to create a cloud storage connection that accesses a Parquet file in a supported cloud storage location?

The Create Cloud Connection File geoprocessing tool documentation lists the supported credential types.

What resource-based policy permissions must I configure for an IAM role to allow ArcGIS Pro to use a Parquet file in an Amazon S3 bucket?

At a minimum, the IAM role requires the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "<statement-id>",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::<cache-bucket-name>/*",
                "arn:aws:s3:::<cache-bucket-name>"
            ]
        }
    ]
}

Replace the values inside the angle brackets (<>) with values specific to your IAM role and bucket.

The version of the policy document format shown above is 2012-10-17. If you change this version date, the document format may also need to change.

What spatial data types are supported in Parquet files used with ArcGIS?

The spatial field in the Parquet file must be defined in one of the following:

Is there a way to display features in a map or scene in ArcGIS Pro based on the information stored in x,y,z fields in a Parquet file?

Run the XY Table To Point geoprocessing tool with the Parquet map layer as the input table to create a feature class in a supported output format. Then add the output feature class to the map or scene.

Can I aggregate features from a Parquet file into bins on the map?

Yes. If the Parquet file contains more than 10,000 rows, the feature layer that is added to the map will draw with geosquare bins. You can set a different scale threshold for the layer or disable binning. However, you cannot change to a different bin type, because only geosquare bins are supported.

Can I publish a web layer from the data in a Parquet file that I add to a map or scene from a folder or cloud storage connection in ArcGIS Pro?

Yes, you can publish hosted web layers, which copy all the data. See Parquet in ArcGIS Pro for a list of the types of web layers you can publish.

Can I include cached Parquet file data in packages, such as map packages or project packages?

No, not at this time.