Worker deployment and performance
This document outlines best practices for deploying and optimizing Workers to ensure high performance, reliability, and scalability.
Core tenets
Workers are the execution layer of Temporal applications. They poll task queues, execute Workflows and Activities, and report results back to the Temporal Server. As such, Worker deployments have the following core tenets:
-
Stateless and ephemeral. Even though Workers retain a cache to increase performance during execution of the same workloads, at their core, workers are stateless processes. All state that your applications rely on for durable execution lives in Temporal. Workers are designed to tolerate restarts and rescheduling.
-
Horizontally scalable. The number of workers must be adjustable based on workload demands.
-
Observable and tunable. Effective Worker tuning requires various metrics, traces, and logs to be collected and acted on.
These core tenets inform all best practices recommended in the following sections.
Deployment model
This section covers best practices for Worker deployment models.
At least two workers per task queue
Scope each Worker pool to a single application and environment
Because workers must be horizontally scalable, it is best to deploy them in pools. A worker pool is a number of workers that run a Temporal Application. We recommend you dedicate each pool to a single application and environment. However, one application can have multiple pools.
Separate Workflow and Activity Worker pools if their resource needs differ significantly
Even within a single Temporal Application, there are often multiple Workflow and Activity types. If your application workloads are small and similar, a single Worker pool can handle all types. However, if your application has distinct workloads with different resource requirements or scaling characteristics, consider separating them into different Worker pools.
Use one Kubernetes pod per Worker
Because workers are stateless and horizontally scalable, Kubernetes is a natural fit for deploying them. If you use Kubernetes to deploy Workers, we recommend using one pod per Worker instance. This approach simplifies resource allocation, scaling, and monitoring.
Resource allocation and monitoring
This section covers best practices for allocating resources to Workers and monitoring their performance.
Monitor both CPU and memory usage
Worker processes are constrained by both CPU and memory. Monitor both metrics to ensure that Workers have sufficient resources to handle their workloads.
TODO: Steps to find out why they are high.
Latency metrics inform resource allocation
Monitor Worker latency metrics, such as Workflow and Activity task latencies, to identify bottlenecks. High latencies
Starting points for production worker deployments
While optimal Worker deployment configurations depend on your specific application workloads, the following starting points can help you get started: