ScaleGuidev2
NewsSandbox
ScaleGuide — Kubernetes Autoscaling, Explained Visually.
DocsVisualize

Autoscaling

Horizontal Pod AutoscalerVertical Pod AutoscalerCluster AutoscalerKEDA

Deployment Strategies

Blue-Green DeploymentCanary DeploymentRolling UpdateRecreate DeploymentA/B Testing DeploymentShadow (Dark) Deployment

PostgreSQL

Prerequisites & SetupWhy PostgreSQL?Backend ConnectionsPractice ExamplesOfficial Docs Summary

Code Sandbox

SQL QueriesK8s ManifestsDeploy Configs

A/B Testing Deployment

Route user segments to different versions to measure business impact.

A/B Testing deployment routes specific user segments to different application versions to measure business impact. Unlike canary (which validates technical health), A/B testing measures user behavior: conversion rates, engagement, revenue.

How It Works

Traffic routing is based on user attributes (geography, device, user ID hash, cookie) rather than random percentage. Both versions run simultaneously, each instrumented with analytics. After enough data is collected for statistical significance, the winning version is promoted.

TipA/B testing is a deployment strategy and a product methodology. It requires coordination between engineering (traffic routing), data science (experiment design), and product (success metrics). The technical deployment is just one piece.

When to Use

  • UI/UX changes where business metrics matter more than error rates
  • Pricing or checkout flow experiments
  • Feature launches where user reception is uncertain
  • You have an analytics platform for experiment analysis

When NOT to Use

  • Backend infrastructure changes with no user-facing impact
  • Bug fixes (just deploy them)
  • You lack analytics infrastructure for measuring outcomes
  • Legal/compliance changes that must apply to all users

Real-World Examples

Amazon - Checkout Button Placement

Amazon tested checkout button placement across 50 million users. Version A had the button above the fold, Version B below. The A/B test ran for 2 weeks and found a 3.2% conversion lift with the above-fold placement, translating to hundreds of millions in additional revenue.

Uber - Surge Pricing Display

Uber A/B tested surge pricing display formats: multiplier (2.3x) vs. flat fare estimate ($34.50). The flat fare format showed 18% higher ride acceptance rates, leading to a global rollout.

Step-by-Step Implementation

1. Deploy both versions with distinct labels

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout
      version: v1
  template:
    metadata:
      labels:
        app: checkout
        version: v1
    spec:
      containers:
      - name: checkout
        image: myregistry/checkout:1.0.0
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-v2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout
      version: v2
  template:
    metadata:
      labels:
        app: checkout
        version: v2
    spec:
      containers:
      - name: checkout
        image: myregistry/checkout:2.0.0-experiment

2. Route by header or cookie (NGINX Ingress)

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: checkout-ab
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: "X-User-Group"
    nginx.ingress.kubernetes.io/canary-by-header-value: "experiment"
spec:
  rules:
  - host: checkout.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: checkout-v2
            port:
              number: 80

3. Analyze results

bash
# Check sample sizes and statistical significance
# Typically done through analytics platform (Amplitude, Mixpanel, etc.)

# Once experiment concludes, promote winner
kubectl scale deployment checkout-v1 --replicas=0
kubectl scale deployment checkout-v2 --replicas=6

Common Pitfalls

PitfallSymptomFix
Insufficient sample sizeResults are not statistically significantCalculate required sample size before starting; run longer if needed
User experience leakageUsers see both versions across sessionsEnsure sticky routing via cookies or user ID hash
Too many concurrent experimentsConfounding variables, unreliable resultsLimit overlapping experiments; use proper experiment framework
Ignoring segment biasResults skewed by non-representative segmentsRandomize user assignment; validate segment demographics match