Worker deployment and performance

This document outlines best practices for deploying and optimizing Workers to ensure high performance, reliability, and scalability.

Core tenets

Workers are the execution layer of Temporal applications. They poll task queues, execute Workflows and Activities, and report results back to the Temporal Server. As such, Worker deployments have the following core tenets:

Stateless and ephemeral. Even though Workers retain a cache to increase performance during execution of the same workloads, at their core, workers are stateless processes. All state that your applications rely on for durable execution lives in Temporal. Workers are designed to tolerate restarts and rescheduling.
Horizontally scalable. The number of workers must be adjustable based on workload demands.
Observable and tunable. Effective Worker tuning requires various metrics, traces, and logs to be collected and acted on.

These core tenets inform all best practices recommended in the following sections.

Deployment model

This section covers best practices for Worker deployment models.

At least two workers per task queue

Scope each Worker pool to a single application and environment

Because workers must be horizontally scalable, it is best to deploy them in pools. A worker pool is a number of workers that run a Temporal Application. We recommend you dedicate each pool to a single application and environment. However, one application can have multiple pools.

Separate Workflow and Activity Worker pools if their resource needs differ significantly

Even within a single Temporal Application, there are often multiple Workflow and Activity types. If your application workloads are small and similar, a single Worker pool can handle all types. However, if your application has distinct workloads with different resource requirements or scaling characteristics, consider separating them into different Worker pools.

Use one Kubernetes pod per Worker

Because workers are stateless and horizontally scalable, Kubernetes is a natural fit for deploying them. If you use Kubernetes to deploy Workers, we recommend using one pod per Worker instance. This approach simplifies resource allocation, scaling, and monitoring.

Resource allocation and monitoring

This section covers best practices for allocating resources to Workers and monitoring their performance.

Monitor both CPU and memory usage

Worker processes are constrained by both CPU and memory. Monitor both metrics to ensure that Workers have sufficient resources to handle their workloads.

TODO: Steps to find out why they are high.

Latency metrics inform resource allocation

Monitor Worker latency metrics, such as Workflow and Activity task latencies, to identify bottlenecks. High latencies

Starting points for production worker deployments

While optimal Worker deployment configurations depend on your specific application workloads, the following starting points can help you get started:

Core tenets​

Deployment model​

At least two workers per task queue​

Scope each Worker pool to a single application and environment​

Separate Workflow and Activity Worker pools if their resource needs differ significantly​

Use one Kubernetes pod per Worker​

Resource allocation and monitoring​

Monitor both CPU and memory usage​

Latency metrics inform resource allocation​

Starting points for production worker deployments​