ScaleGuidev2
NewsSandbox
ScaleGuide — Kubernetes Autoscaling, Explained Visually.
DocsVisualize

Autoscaling

Horizontal Pod AutoscalerVertical Pod AutoscalerCluster AutoscalerKEDA

Deployment Strategies

Blue-Green DeploymentCanary DeploymentRolling UpdateRecreate DeploymentA/B Testing DeploymentShadow (Dark) Deployment

PostgreSQL

Prerequisites & SetupWhy PostgreSQL?Backend ConnectionsPractice ExamplesOfficial Docs Summary

Code Sandbox

SQL QueriesK8s ManifestsDeploy Configs

Official Docs Summary

Condensed PostgreSQL official docs: essential data types, critical configuration, backup/restore, and replication.

The official PostgreSQL documentation is comprehensive but dense (3000+ pages). This section condenses the most important topics: data types, configuration, backup/restore, and replication.

Essential Data Types

TypeUse CaseExample
SERIAL / BIGSERIALAuto-incrementing IDsid SERIAL PRIMARY KEY
UUIDDistributed unique IDsgen_random_uuid()
VARCHAR(n) / TEXTVariable-length stringsname VARCHAR(100)
NUMERIC(p,s)Exact decimal (money, finance)price NUMERIC(10,2)
TIMESTAMPTZDate/time with timezonecreated_at TIMESTAMPTZ DEFAULT NOW()
JSONBSemi-structured data with indexingmetadata JSONB DEFAULT '{}'
ARRAYLists of values in a single columntags TEXT[]
BOOLEANTrue/false flagsactive BOOLEAN DEFAULT true
INET / CIDRIP addresses and networksclient_ip INET
TipUse TIMESTAMPTZ (with timezone) instead of TIMESTAMP for all time data. PostgreSQL stores both as UTC internally, but TIMESTAMPTZ correctly converts on input/output.

Critical Configuration Parameters

text
# postgresql.conf - Key settings

# Memory
shared_buffers = '256MB'          # 25% of available RAM
work_mem = '64MB'                 # Per-operation sort/hash memory
effective_cache_size = '768MB'    # OS cache estimate (50-75% of RAM)

# WAL & Checkpoints
wal_level = 'replica'             # Required for replication
max_wal_size = '1GB'
checkpoint_completion_target = 0.9

# Connections
max_connections = 200             # Coordinate with pooler
listen_addresses = '*'            # Bind to all interfaces

# Logging
log_min_duration_statement = 1000 # Log queries > 1 second
log_statement = 'ddl'             # Log DDL statements
WarningDo not set shared_buffers too high. Going above 40% of RAM can hurt performance because it competes with the OS page cache. Start at 25% and benchmark.

Backup & Restore

Logical Backup (pg_dump)

bash
# Backup a single database
pg_dump -U postgres -Fc mydb > mydb.dump

# Restore from custom format
pg_restore -U postgres -d mydb mydb.dump

# Restore specific table
pg_restore -U postgres -d mydb -t users mydb.dump

Physical Backup (pg_basebackup)

bash
# Full physical backup for point-in-time recovery
pg_basebackup -U replicator -D /backup/base -Ft -z -P

Replication

Streaming Replication

Physical byte-for-byte replication. The replica is an exact copy of the primary. Best for read replicas and high availability.

text
# On primary (postgresql.conf)
wal_level = 'replica'
max_wal_senders = 3
wal_keep_size = '1GB'

# On replica
primary_conninfo = 'host=primary-host user=replicator password=secret'

Logical Replication

Table-level replication that decodes WAL into logical changes. Allows replicating specific tables and cross-version replication.

sql
-- On publisher
CREATE PUBLICATION my_pub FOR TABLE users, orders;

-- On subscriber
CREATE SUBSCRIPTION my_sub
  CONNECTION 'host=publisher-host dbname=mydb user=replicator'
  PUBLICATION my_pub;
PitfallSymptomFix
No monitoring on replica lagStale reads, eventual consistency surprisesMonitor pg_stat_replication.replay_lag
No backup testingBackup is corrupt when you need itRegularly restore backups to a test environment
shared_buffers too largePerformance degrades under memory pressureKeep at 25% of RAM, let OS cache handle the rest
Missing indexesSeq scans on large tables, slow queriesUse EXPLAIN ANALYZE and pg_stat_user_tables to find missing indexes

Real-World Stories

Failure

GitLab - The Accidental DELETE That Took Down Production

On January 31, 2017, a GitLab engineer ran rm -rf on a PostgreSQL data directory on the wrong server -- the primary instead of a replica. Of their 5 backup strategies, only one (LVM snapshots) worked, and it was 6 hours old. pg_dump backups were failing silently due to a configuration error. Logical replication was set up but not tested. GitLab lost 6 hours of production data. The incident led to their famous "postmortem live stream" and a complete overhaul of backup verification.

Success

Crunchy Data - Automated PITR Saves a Healthcare Company

A healthcare SaaS company accidentally ran an UPDATE without a WHERE clause, corrupting 2 million patient records. Using Crunchy Data's PostgreSQL setup with continuous WAL archiving, they performed point-in-time recovery (PITR) to exactly 30 seconds before the bad UPDATE. Full recovery took 22 minutes with zero data loss. The key was having archive_mode = on and archive_command shipping WAL segments to S3 every 60 seconds.

Success

Zalando - Patroni Cluster Survives AWS AZ Failure

When an entire AWS availability zone went down in 2019, Zalando's PostgreSQL clusters (managed by Patroni) automatically promoted replicas in healthy AZs within 10 seconds. Their e-commerce platform served 35 million customers with no visible impact. The setup used synchronous replication across AZs with Patroni handling automatic failover, proving that PostgreSQL can match the high-availability guarantees of proprietary databases.