Pod placement

In Kubernetes, newly created and unscheduled pods are automatically scheduled on nodes that meet their requirements. When you have different types of nodes in your cluster, it can be important to exercise control over the nodes pods are scheduled on. There are four tools you can use control pod placement:

Node labels are applied to a node to specify information about the node. A label can be a key or a key/value pair.
Node affinity is applied to a pod to constrain that pod to run on nodes with a specified label.
Taints are applied to a node to prevent pods from being scheduled on that node.
Tolerations are applied to a pod to allow that pod to ignore specified taints.

To learn more about node labels, node affinity, taints, and tolerations generally, refer to the Kubernetes documentation. For information about applying node affinity and tolerations to pods in ArcGIS Enterprise on Kubernetes, see Manage pod placement.

Benefits

Node labels and node affinity ensure that a pod is scheduled on the appropriate node. Taints and tolerations ensure that a node only has the appropriate pods scheduled on it. Using them in combination helps you enhance isolation, optimize resource allocation, and effectively meet compliance requirements within your Kubernetes cluster:

Isolate workloads with specialized requirements—Use labels and node affinity rules to ensure that certain pods are scheduled on dedicated nodes. Use taints to mark nodes with specific characteristics, such as high CPU or memory requirements, as dedicated for ArcGIS workloads. Apply tolerations on service pods to ensure they are scheduled on nodes with the required resources.
Optimize resource allocation—Apply taints on nodes with limited resources to prevent resource overload, and define tolerations on service pods to match the resource constraints on these nodes. Combine node affinity with taints and tolerations to ensure that service pods are only scheduled on nodes that can meet their resource requirements.
Geolocation-based scheduling—For applications that require data locality or adherence to specific regulations, use node affinity to schedule service pods based on the geographic location of nodes. Taint nodes based on their physical location or data sovereignty regulations, and apply tolerations on service pods to ensure they are scheduled on nodes and compliant with required location constraints.

Autoscaling enhances the use of node affinity and tolerations by dynamically adjusting the number of pods based on workload demands. This dynamic scaling ensures that pods are efficiently scheduled on nodes that meet specific requirements or have the necessary resources available, optimizing resource allocation. By combining autoscaling with node affinity and tolerations, Kubernetes clusters can achieve improved resource utilization, performance, and scalability—adapting to workload fluctuations while adhering to node constraints and preferences. To learn more about autoscaling, see Service scaling.

Manage pod placement

It is recommended to apply node labels before node affinity. This prevents pods from being stuck in a pending state because nodes do not yet have the matching label. Similarly, it is recommended to apply tolerations before tainting nodes. This ensures you avoid unintentionally evicting pods that do not yet have matching tolerations, causing disruption to your services.

In most environments, you can group your workloads using node pools or node groups. It is recommended that you apply labels and taints to groups of nodes rather than individual nodes.

In ArcGIS Enterprise on Kubernetes, you can use ArcGIS Enterprise Manager to set node affinity and tolerations for the following deployments:

Node affinity configuration

To configure a node affinity rule, set the following values:

Type—The type of node affinity. The following are the available types:
- Preferred (PreferredDuringSchedulingIgnoredDuringExecution)—The pod prefers to be scheduled on a node that satisfies the rule.
- Required (RequiredDuringSchedulingIgnoredDuringExecution)—The pod must be scheduled on a node that satisfies the rule.
Key—The key of the node label or annotation that the rule should match.
Operator—The operator for the rule. The following are the available operators:
- In—The node label or annotation must be in the list of values specified.
- Not in—The node label or annotation must not be in the list of values specified.
- Exists—The node must have the specified label or annotation.
- Does not exist—The node must not have the specified label or annotation.
Value—The list of values to match against the node label or annotation. Not available when the Exists or Does not exist operator is selected.

Tolerations configuration

To configure a toleration, set the following values:

Effect—The taint effect that the toleration should match. The following are the available effects:
- No schedule—New pods are not scheduled to the tainted node without a matching toleration.
- Prefer no schedule—New pods try to avoid being scheduled on the tainted node without a matching toleration, but it is not guaranteed.
- No Execute—Any pods without a matching toleration are evicted immediately after the node is tainted. New pods are not scheduled to the tainted node without a matching toleration.
Key—The key of the taint that the toleration should match.
Operator—The operator to use for the toleration. The following are the available operators:
- Equal—The pod tolerates a taint with the specified key and value.
- Exists—The pod tolerates any taint with the specified key.
Value—The value of the taint that the toleration should match if the operation is set to equal. Not available when the Exists operator is selected.

Note:

Raster analysis and notebook service workloads that need to run GPU-enabled nodes require specific node affinity and toleration settings. For more information on these settings, see Configure GPU-enabled nodes for raster analysis workloads and View and edit runtimes for notebook service workloads.

Scenarios

To better understand how managing pod placement on services can benefit your organization, review the following scenarios.

Scenario 1: Seasonal traffic surge for public mapping services

A public organization experiences a significant increase in traffic during a local festival. Users accessing the web map for event information experience delays due to high demand on the underlying map service. To address this, the organization administrator does the following:

Adds the key/value label high-performance: true to nodes with high CPU and memory resources.
Applies node affinity rules to prioritize scheduling the map service pods on nodes with high CPU and memory resources. This ensures that the map service can handle the surge in traffic:
- Type—Preferred
- Key—high-performance
- Operator—Exists
- Value—true
Applies a toleration to the pods for the critical map service, allowing them to be scheduled on nodes tainted for high-performance workloads.
- Effect—NoSchedule
- Key—workload
- Operator—Equal
- Value—high-performance
Taints high-performance nodes with workload=high-performance:NoSchedule to prevent less critical pods without the matching toleration from being scheduled on the node. This ensures there will be room on the high performance nodes to schedule the pods for services related to the event.

Scenario 2: Data processing for environmental monitoring

An environmental agency is running a series of geospatial analyses to monitor changes in land use. The analysis requires significant computational resources, and the agency has dedicated nodes with GPUs for this purpose. To ensure that the geospatial analysis runs effectively without competing for resources with other services, the organization administrator:

Adds the key/value label gpu: true to GPU enabled nodes.
Applies node affinity rules to schedule the analysis pods only on the GPU nodes:
- Type—Required
- Key—gpu
- Operator—In
- Value—true
Applies a toleration to the analysis pods, allowing them to be scheduled on the tainted GPU nodes:
- Effect—NoSchedule
- Key—workload
- Operator—Equal
- Value—high-resource
Taints GPU nodes with workload=high-resource:NoSchedule to prevent less resource-intensive pods without the matching toleration from being scheduled on the node. This ensures expensive GPU-enabled machines are only used for the analysis pods that need them.

Scenario 3: Resource optimization for shared feature services

A city's GIS department has numerous feature services that are not heavily used but collectively burden a single service deployment. The cluster contains premium nodes to ensure guaranteed quality of service for high priority services and nodes with lower resource availability for low priority services. To ensure these low-priority feature services are preferentially scheduled on the less expensive nodes, the organization administrator:

Configures the nodes with lower resource availability with the key resource-constrained.
Applies node affinity rules for feature service pods to prioritize scheduling on nodes with lower resource availability:
- Type—Preferred
- Key—resource-constrained
- Operator—DoesNotExist
Applies tolerations on feature service pods to ensure they can be scheduled on tainted nodes despite constraints:
- Effect—PreferNoSchedule
- Key—resource-constrained
- Operator—Exists
Taints the nodes with lower resource availability with resource-constrained:PreferNoSchedule.

Scenario 4: Prevent data store disruption during cluster scaling

A national government has a service usage pattern where services are heavily used during daytime hours. This pattern requires a large number of cluster nodes to support the all the pod replicas needed for these services. Because the services are not used at night, the organization would like to scale down the number of nodes to save on their cloud compute costs. Terminating nodes where stateful system managed data store pods are running, however, creates risk of disrupting ArcGIS Enterprise. To prevent this potential disruption, the organization administrator:

Creates a separate node group.
Adds the key/value label data-store: true to each node in the group.
Applies node affinity rules to ensure data store pods are scheduled on nodes in this group
- Type—Required
- Key—data-store
- Operator—In
- Value—true
Applies a toleration to the stateful data store pods, allowing them to run on the tainted data store nodes:
- Effect—NoSchedule
- Key—workload
- Operator—Equal
- Value—data-store
Taints data store nodes with workload=data-store:NoSchedule to prevent stateless pods without the matching toleration from being scheduled on the node. This ensures there is room on the data store nodes to schedule all the stateful data store pods.
Does not scale down the data store node group when scaling down cluster nodes at night. Because all the stateful data store pods are in this node group, they are not affected by scaling down the other node groups.