Skip to main content

Runtime Services

The Esper CLI manages the complete runtime stack for self-hosted deployments. The architecture separates concerns across specialized services for scalability and fault isolation.

Architecture Overview

Esper's runtime follows a pipeline architecture with clear service boundaries:

graph LR
C[Client] --> S[esper-server]
S --> IB[Ingestion Broker]
IB --> E[Engine Worker]
E --> MB[Mitigation Broker]
MB --> S
S --> C

Each service has distinct responsibilities:

  • esper-server: API gateway and control plane
  • Ingestion Broker: Request queuing and distribution
  • Engine Worker: Policy evaluation and state management
  • Mitigation Broker: Decision caching and enforcement

Control Plane Server

The control plane manages configuration and routes traffic.

Basic Operation

Start the server with default configuration:

esper server run

Default bindings:

  • HTTP: 0.0.0.0:8080
  • Metrics: 0.0.0.0:9090
  • Health: 0.0.0.0:8081

Configuration Management

Override defaults with environment files:

# Development configuration
esper server run --config ./configs/dev.env

# Production with specific overrides
esper server run --config ./configs/prod.env --port 8443 --tls

Configuration precedence (highest to lowest):

  1. Command-line flags
  2. Environment variables
  3. Config file
  4. Defaults
tip

Use esper server config to generate a complete configuration template with all available options.

Health Monitoring

The server exposes health endpoints for orchestration:

# Liveness check
curl http://localhost:8081/healthz

# Readiness check (includes dependency checks)
curl http://localhost:8081/readyz

# Detailed health with component status
curl http://localhost:8081/healthz/detailed

Response codes:

  • 200: Healthy
  • 503: Unhealthy or dependencies unavailable

Graceful Shutdown

The server handles shutdown signals intelligently:

# Send SIGTERM for graceful shutdown
kill -TERM $(pgrep esper-server)

# Or use the CLI
esper server stop --graceful --timeout 30s

Shutdown sequence:

  1. Stop accepting new requests
  2. Wait for in-flight requests (up to timeout)
  3. Close database connections
  4. Flush metrics
  5. Exit

Engine Worker

The engine worker processes evaluation workloads from the ingestion broker.

Worker Configuration

Start workers with appropriate resource limits:

# Basic worker
esper worker run engine

# Production worker with tuning
esper worker run engine \
--concurrency 16 \
--batch-size 100 \
--memory-limit 4G

Key parameters:

  • --concurrency: Parallel evaluation threads
  • --batch-size: Events per processing batch
  • --memory-limit: Maximum heap size
  • --state-backend: State storage (redis|memory|postgres)

Scaling Workers

Deploy multiple workers for horizontal scaling:

# Start worker pool
for i in {1..4}; do
esper worker run engine \
--worker-id "worker-$i" \
--config ./engine.env &
done

# Monitor worker status
esper worker status

# Scale based on queue depth
esper worker autoscale \
--min 2 \
--max 10 \
--target-queue-depth 1000
caution

Workers must have unique IDs when running multiple instances on the same host.

State Management

Workers maintain hot state for entity tracking and rate limiting:

# Configure Redis state backend
export ESPER_STATE_BACKEND=redis
export ESPER_REDIS_URL=redis://localhost:6379/0

# Configure state TTL and cleanup
esper worker run engine \
--state-ttl 3600 \
--state-cleanup-interval 300

State backends comparison:

BackendUse CaseProsCons
MemoryDevelopmentFast, simpleNo persistence
RedisProductionFast, shared stateRequires Redis
PostgresHigh durabilityPersistent, queryableSlower

Performance Tuning

Optimize worker performance for your workload:

# CPU-bound workloads (complex policies)
esper worker run engine \
--concurrency $(nproc) \
--batch-size 50 \
--evaluation-timeout 100ms

# Memory-bound workloads (large state)
esper worker run engine \
--concurrency 4 \
--memory-limit 8G \
--state-cache-size 100000

# I/O-bound workloads (external enrichment)
esper worker run engine \
--concurrency 32 \
--io-threads 16 \
--connection-pool-size 100

Broker Services

Brokers provide durable queuing and service decoupling.

Ingestion Broker

The ingestion broker queues incoming requests for processing.

# Run with default configuration
cargo run --manifest-path esper-rs/Cargo.toml \
--package esper-ingestion-broker --bin main

# Production configuration
BROKER_PORT=8082 \
REDIS_URL=redis://localhost:6379 \
MAX_QUEUE_SIZE=1000000 \
BATCH_TIMEOUT_MS=100 \
cargo run --release --package esper-ingestion-broker --bin main

Queue management:

# View queue metrics
curl http://localhost:8082/metrics | grep queue

# Inspect queue depth
curl http://localhost:8082/api/queue/status

# Pause ingestion (for maintenance)
curl -X POST http://localhost:8082/api/queue/pause

# Resume ingestion
curl -X POST http://localhost:8082/api/queue/resume
info

The ingestion broker implements backpressure. It returns 503 when the queue is full.

Mitigation Broker

The mitigation broker caches and serves policy decisions.

# Run with configuration
cargo run --manifest-path esper-rs/Cargo.toml \
--package esper-mitigation-broker --bin main

# Production with persistence
BROKER_PORT=8083 \
REDIS_URL=redis://localhost:6379 \
CACHE_TTL_SECONDS=900 \
PERSISTENCE_ENABLED=true \
cargo run --release --package esper-mitigation-broker --bin main

Cache operations:

# Query active mitigations
curl http://localhost:8083/api/mitigations/active

# Get mitigation for entity
curl http://localhost:8083/api/mitigations/entity/<entity-id>

# Clear mitigation cache (careful!)
curl -X POST http://localhost:8083/api/cache/clear \
-H "Authorization: Bearer $ADMIN_TOKEN"

Broker High Availability

Deploy brokers in HA configuration:

# docker-compose-ha.yml
version: "3.8"
services:
ingestion-broker-1:
image: esperr/ingestion-broker
environment:
REDIS_URL: redis://redis:6379
CLUSTER_MODE: true
NODE_ID: broker-1

ingestion-broker-2:
image: esperr/ingestion-broker
environment:
REDIS_URL: redis://redis:6379
CLUSTER_MODE: true
NODE_ID: broker-2

haproxy:
image: haproxy
volumes:
- ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg
ports:
- "8082:8082"

Local Stack Orchestration

Docker Compose Deployment

Run the complete stack with Docker Compose:

# Start all services
docker-compose up -d

# Scale workers
docker-compose up -d --scale engine-worker=4

# View logs
docker-compose logs -f engine-worker

# Stop gracefully
docker-compose down --timeout 30

Example docker-compose.yml:

version: "3.8"
services:
postgres:
image: postgres:15
environment:
POSTGRES_DB: esper
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres-data:/var/lib/postgresql/data

redis:
image: redis:7
command: redis-server --maxmemory 2gb --maxmemory-policy lru

esper-server:
build: .
command: esper server run
environment:
DATABASE_URL: postgresql://postgres:${DB_PASSWORD}@postgres/esper
REDIS_URL: redis://redis:6379
depends_on:
- postgres
- redis
ports:
- "8080:8080"

ingestion-broker:
build: ./esper-rs
command: cargo run --package esper-ingestion-broker --bin main
environment:
REDIS_URL: redis://redis:6379
depends_on:
- redis

engine-worker:
build: .
command: esper worker run engine
environment:
DATABASE_URL: postgresql://postgres:${DB_PASSWORD}@postgres/esper
REDIS_URL: redis://redis:6379
depends_on:
- postgres
- redis
- ingestion-broker
deploy:
replicas: 2

mitigation-broker:
build: ./esper-rs
command: cargo run --package esper-mitigation-broker --bin main
environment:
REDIS_URL: redis://redis:6379
depends_on:
- redis

volumes:
postgres-data:

Process Management

Use process managers for production deployments:

Systemd

# /etc/systemd/system/esper-server.service
[Unit]
Description=Esper Control Plane Server
After=network.target postgresql.service redis.service

[Service]
Type=simple
User=esper
WorkingDirectory=/opt/esper
ExecStart=/opt/esper/bin/esper server run --config /etc/esper/server.env
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Manage with systemctl:

# Enable and start
sudo systemctl enable esper-server
sudo systemctl start esper-server

# Check status
sudo systemctl status esper-server

# View logs
journalctl -u esper-server -f

# Reload configuration
sudo systemctl reload esper-server

Supervisor

# /etc/supervisor/conf.d/esper.conf
[program:esper-server]
command=/opt/esper/bin/esper server run
directory=/opt/esper
user=esper
autostart=true
autorestart=true
stdout_logfile=/var/log/esper/server.log
stderr_logfile=/var/log/esper/server-error.log
environment=PATH="/opt/esper/bin:%(ENV_PATH)s"

[program:engine-worker]
command=/opt/esper/bin/esper worker run engine
process_name=%(program_name)s_%(process_num)02d
numprocs=4
directory=/opt/esper
user=esper
autostart=true
autorestart=true
stdout_logfile=/var/log/esper/worker-%(process_num)02d.log

Monitoring & Observability

Metrics Collection

All services expose Prometheus metrics:

# prometheus.yml
scrape_configs:
- job_name: esper-server
static_configs:
- targets: ["localhost:9090"]

- job_name: engine-workers
static_configs:
- targets: ["localhost:9091", "localhost:9092"]

- job_name: brokers
static_configs:
- targets: ["localhost:8082", "localhost:8083"]

Key metrics to monitor:

# Request rate
rate(esper_requests_total[5m])

# Policy evaluation latency
histogram_quantile(0.99, esper_evaluation_duration_seconds)

# Queue depth
esper_queue_depth{queue="ingestion"}

# Worker utilization
esper_worker_utilization_ratio

Distributed Tracing

Enable tracing for request flow visibility:

# Configure OpenTelemetry
export OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317
export OTEL_SERVICE_NAME=esper-server
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1

esper server run --tracing

Logging

Configure structured logging:

# JSON logging for production
export LOG_FORMAT=json
export LOG_LEVEL=info

# Verbose logging for debugging
export LOG_LEVEL=debug
export LOG_INCLUDE_CALLER=true

# Log to file
esper server run 2>&1 | tee -a /var/log/esper/server.log

Log aggregation with Fluentd:

# fluent.conf
<source>
@type tail
path /var/log/esper/*.log
pos_file /var/log/td-agent/esper.pos
tag esper.*
format json
</source>

<match esper.**>
@type elasticsearch
host elasticsearch
port 9200
index_name esper
type_name logs
</match>
note

Use correlation IDs to trace requests across services. The CLI automatically propagates X-Request-ID headers.

Troubleshooting

Common Issues

Service won't start

Error: Cannot bind to port 8080: Address already in use

Solution: Check for conflicting services with lsof -i :8080.

Worker connection failures

Error: Cannot connect to ingestion broker: Connection refused

Solution: Verify broker is running and check firewall rules.

State inconsistency

Warning: State divergence detected

Solution: Clear Redis state and restart workers.

Debug Mode

Enable comprehensive debugging:

# Maximum verbosity
export LOG_LEVEL=trace
export ESPER_DEBUG=true
export RUST_BACKTRACE=full

# Debug specific subsystem
export ESPER_DEBUG_SUBSYSTEM=evaluation

# Capture debug output
esper worker run engine 2>&1 | tee debug.log

Performance Profiling

Profile runtime performance:

# CPU profiling
esper worker run engine --cpuprofile cpu.prof

# Memory profiling
esper worker run engine --memprofile mem.prof

# Analyze profiles
go tool pprof -http=:8080 cpu.prof