ScaleGuidev2
NewsSandbox
ScaleGuide — Kubernetes Autoscaling, Explained Visually.
DocsVisualize

Autoscaling

Horizontal Pod AutoscalerVertical Pod AutoscalerCluster AutoscalerKEDA

Deployment Strategies

Blue-Green DeploymentCanary DeploymentRolling UpdateRecreate DeploymentA/B Testing DeploymentShadow (Dark) Deployment

PostgreSQL

Prerequisites & SetupWhy PostgreSQL?Backend ConnectionsPractice ExamplesOfficial Docs Summary

Code Sandbox

SQL QueriesK8s ManifestsDeploy Configs

Horizontal Pod Autoscaler (HPA)

Scale the number of pod replicas based on CPU, memory, or custom metrics.

The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a Deployment based on observed metrics like CPU utilization, memory, or custom metrics. It is the most commonly used autoscaler in Kubernetes.

How It Works

HPA runs a control loop every 15 seconds (configurable). It queries the Metrics Server for current utilization, calculates the desired replica count using the formula:

text
desiredReplicas = ceil(currentReplicas * (currentMetricValue / targetMetricValue))

Example: 3 replicas at 80% CPU, target 50%
= ceil(3 * (80 / 50)) = ceil(4.8) = 5 replicas

The HPA then updates the replica count on the target Deployment. A stabilization window (5 min for scale-down by default) prevents rapid flapping.

InfoEvery container must have resources.requests defined for the metrics being tracked, otherwise HPA will show <unknown>.

When to Use

  • Stateless web apps, REST APIs, microservices
  • Load correlates with CPU, memory, or request rate
  • Variable or unpredictable traffic patterns
  • You need fast scale-out (seconds to minutes)

When NOT to Use

  • Singleton workloads that can't run multiple replicas
  • Databases or stateful workloads (use VPA)
  • You need scale-to-zero (use KEDA)
  • I/O-bound workloads where CPU doesn't reflect load

Real-World Example

Netflix-style Streaming Service

A video transcoding service scales from 25 to 250 pods during peak evening hours based on a custom metric active_streams_per_pod. When the average exceeds 200 streams per pod, HPA triggers. The stabilization window ensures gradual scale-down after midnight, preventing premature termination of active streams.

Step-by-Step Implementation

1. Ensure Metrics Server is installed

bash
# Check if metrics-server is running
kubectl get deployment metrics-server -n kube-system

# If not installed
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

2. Deploy with resource requests

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-api
  template:
    metadata:
      labels:
        app: web-api
    spec:
      containers:
      - name: web-api
        image: myregistry/web-api:1.4.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "250m"       # REQUIRED for HPA
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

3. Create the HPA

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

4. Verify

bash
kubectl apply -f hpa.yaml
kubectl get hpa web-api-hpa --watch
kubectl describe hpa web-api-hpa

Common Pitfalls

PitfallSymptomFix
Missing resource requestsHPA shows <unknown> for metricsAdd resources.requests.cpu to every container
Metrics Server not installed"unable to get metrics" errorInstall Metrics Server in kube-system
Pod flappingReplicas oscillate rapidlyAdd stabilizationWindowSeconds to behavior
Memory as primary metricUnnecessary scaling (GC behavior)Use CPU or custom metrics as primary
Insufficient cluster capacityNew pods stuck in PendingPair with Cluster Autoscaler