A/B Testing Deployment

Route user segments to different versions to measure business impact.

A/B Testing deployment routes specific user segments to different application versions to measure business impact. Unlike canary (which validates technical health), A/B testing measures user behavior: conversion rates, engagement, revenue.

How It Works

Traffic routing is based on user attributes (geography, device, user ID hash, cookie) rather than random percentage. Both versions run simultaneously, each instrumented with analytics. After enough data is collected for statistical significance, the winning version is promoted.

TipA/B testing is a deployment strategy and a product methodology. It requires coordination between engineering (traffic routing), data science (experiment design), and product (success metrics). The technical deployment is just one piece.

When to Use

UI/UX changes where business metrics matter more than error rates
Pricing or checkout flow experiments
Feature launches where user reception is uncertain
You have an analytics platform for experiment analysis

When NOT to Use

Backend infrastructure changes with no user-facing impact
Bug fixes (just deploy them)
You lack analytics infrastructure for measuring outcomes
Legal/compliance changes that must apply to all users

Real-World Examples

Amazon - Checkout Button Placement

Amazon tested checkout button placement across 50 million users. Version A had the button above the fold, Version B below. The A/B test ran for 2 weeks and found a 3.2% conversion lift with the above-fold placement, translating to hundreds of millions in additional revenue.

Uber - Surge Pricing Display

Uber A/B tested surge pricing display formats: multiplier (2.3x) vs. flat fare estimate ($34.50). The flat fare format showed 18% higher ride acceptance rates, leading to a global rollout.

Step-by-Step Implementation

1. Deploy both versions with distinct labels

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout
      version: v1
  template:
    metadata:
      labels:
        app: checkout
        version: v1
    spec:
      containers:
      - name: checkout
        image: myregistry/checkout:1.0.0
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-v2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout
      version: v2
  template:
    metadata:
      labels:
        app: checkout
        version: v2
    spec:
      containers:
      - name: checkout
        image: myregistry/checkout:2.0.0-experiment

2. Route by header or cookie (NGINX Ingress)

yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: checkout-ab
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: "X-User-Group"
    nginx.ingress.kubernetes.io/canary-by-header-value: "experiment"
spec:
  rules:
  - host: checkout.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: checkout-v2
            port:
              number: 80

3. Analyze results

bash

# Check sample sizes and statistical significance
# Typically done through analytics platform (Amplitude, Mixpanel, etc.)

# Once experiment concludes, promote winner
kubectl scale deployment checkout-v1 --replicas=0
kubectl scale deployment checkout-v2 --replicas=6

Common Pitfalls

Pitfall	Symptom	Fix
Insufficient sample size	Results are not statistically significant	Calculate required sample size before starting; run longer if needed
User experience leakage	Users see both versions across sessions	Ensure sticky routing via cookies or user ID hash
Too many concurrent experiments	Confounding variables, unreliable results	Limit overlapping experiments; use proper experiment framework
Ignoring segment bias	Results skewed by non-representative segments	Randomize user assignment; validate segment demographics match

A/B Testing Deployment

Route user segments to different versions to measure business impact.

How It Works

When to Use

UI/UX changes where business metrics matter more than error rates
Pricing or checkout flow experiments
Feature launches where user reception is uncertain
You have an analytics platform for experiment analysis

When NOT to Use

Backend infrastructure changes with no user-facing impact
Bug fixes (just deploy them)
You lack analytics infrastructure for measuring outcomes
Legal/compliance changes that must apply to all users

Real-World Examples

Amazon - Checkout Button Placement

Uber - Surge Pricing Display

Uber A/B tested surge pricing display formats: multiplier (2.3x) vs. flat fare estimate ($34.50). The flat fare format showed 18% higher ride acceptance rates, leading to a global rollout.

Step-by-Step Implementation

1. Deploy both versions with distinct labels

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout
      version: v1
  template:
    metadata:
      labels:
        app: checkout
        version: v1
    spec:
      containers:
      - name: checkout
        image: myregistry/checkout:1.0.0
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-v2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout
      version: v2
  template:
    metadata:
      labels:
        app: checkout
        version: v2
    spec:
      containers:
      - name: checkout
        image: myregistry/checkout:2.0.0-experiment

2. Route by header or cookie (NGINX Ingress)

yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: checkout-ab
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: "X-User-Group"
    nginx.ingress.kubernetes.io/canary-by-header-value: "experiment"
spec:
  rules:
  - host: checkout.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: checkout-v2
            port:
              number: 80

3. Analyze results

bash

# Check sample sizes and statistical significance
# Typically done through analytics platform (Amplitude, Mixpanel, etc.)

# Once experiment concludes, promote winner
kubectl scale deployment checkout-v1 --replicas=0
kubectl scale deployment checkout-v2 --replicas=6

Common Pitfalls

Pitfall	Symptom	Fix
Insufficient sample size	Results are not statistically significant	Calculate required sample size before starting; run longer if needed
User experience leakage	Users see both versions across sessions	Ensure sticky routing via cookies or user ID hash
Too many concurrent experiments	Confounding variables, unreliable results	Limit overlapping experiments; use proper experiment framework
Ignoring segment bias	Results skewed by non-representative segments	Randomize user assignment; validate segment demographics match