Migrating PostgreSQL in Kubernetes: A Homelab Christmas Adventure

Introduction

In a Production environment, upgrading databases is arguably one of the most critical and nerve-wracking tasks a DevOps or SRE team has to face. We all know someone (or are someone) who has a war story about a "simple" upgrade turning into a lost weekend spent recovering data.

In the era of Kubernetes, this pressure is compounded because the release velocity is so fast; you're often looking at a major version upgrade every 1 to 1.5 years. That is exactly why, in every company I’ve worked for, I always suggest using DB as a Service (like AWS RDS). Offload that pain to the cloud provider so you can sleep at night.

However, in my Homelab (Datahub.local), I don't have that luxury (and the budget!). So, this holiday season, it was finally time to bite the bullet and upgrade my PostgreSQL cluster running on CloudNativePG (CNPG).

The Challenge: It's Not Just One Version

CNPG Upgrade Infographic

Before diving into the "how," it's important to understand that with CloudNativePG, you aren't just managing one version. You are juggling two: 1. The PostgreSQL Version 2. The OS Version of the container image (usually Debian based)

Both have different End-of-Life (EOL) dates that you need to watch out for.

PostgreSQL EOLs:

Version Release Date EOL
18 2025-09-25 2030-11-14
17 2024-09-26 2029-11-08
16 2023-09-14 2028-11-09

Debian OS EOLs:

Name Status EOL
Trixie (stable) Supported 2030-06-30
Bookworm (oldstable) Supported 2028-06-30
Bullseye Deprecated 2026-08-31

Depending on what you need to upgrade, the difficulty level changes exponentially.

More info on that can be found in the CNPG PostgreSQL Container Images.

Rolling Update: Level Easy (Postgres Only)

If you stick to the same OS, upgrading Postgres is trivial. You can go from Postgres 16 to 17 just by changing the image tag.

Rolling Update means CNPG gradually replaces one pod at a time while maintaining the cluster's availability—no downtime, no data movement needed.

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg-cluster
spec:
  # Changing from 16 to 17 on the same OS (Bookworm)
  image: ghcr.io/cloudnative-pg/postgresql:17-system-bookworm 
  instances: 3
  storage:
    size: 1Gi

Blue-Green Update: Level Advanced (Postgres + OS)

This is where I found myself. I wanted to move from Postgres 16 on Bookworm to Postgres 17 on Trixie.

Here's the catch: You cannot do a direct rolling upgrade if the underlying OS changes. The glibc versions or other system libraries might differ, making the data directory incompatible or risky to upgrade in place.

Blue-Green Update is the solution—you deploy an entirely new cluster alongside the old one, migrate data, then switch traffic over. This approach is safer when OS components differ, and it gives you a quick rollback option if issues arise.

My Migration Strategy

Since I wanted the latest and greatest (and to reset the clock on those EOL dates), I opted for the advanced path. Here is the workflow I used to migrate with zero downtime for my apps.

Step 1: The Abstraction Layer

First, I created a Kubernetes Service of type ExternalName. This acts as a pointer. I configured all my applications to talk to this Service (pg-cluster-entrypoint) instead of the actual CNPG service name.

This is crucial because it decouples the "address" my apps use from the actual cluster running behind the scenes.

apiVersion: v1
kind: Service
metadata:
  name: "pg-cluster-entrypoint"
spec:
  type: ExternalName
  # Pointing to the CURRENT Postgres 16 cluster
  externalName: 'pg-cluster-rw.postgres.svc.cluster.local'
  ports:
    - name: postgres
      port: 5432
      protocol: TCP
      targetPort: postgres

Step 2: The New Cluster

Next, I spun up a completely new cluster running Postgres 17 on Trixie. I used the bootstrap.initdb.import feature to pull data directly from the old cluster. More info on that can be found in the CloudNativePG docs.

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg-cluster-17
spec:
  image: ghcr.io/cloudnative-pg/postgresql:17-system-trixie
  instances: 3
  bootstrap:
    initdb:
      import:
        type: monolith
        databases:
          - "*"
        roles:
          - "*"
        source:
          externalCluster: old-pg-cluster
        pgDumpExtraOptions:
          - --no-privileges
  storage:
    size: 1Gi
  externalClusters:
    - name: old-pg-cluster
      connectionParameters:
        # Connect to the OLD cluster
        host: pg-cluster-rw.postgres.svc.cluster.local
        user: postgres
        dbname: postgres
        sslmode: require
      password:
        name: pg-cluster-superuser
        key: password

This creates a fresh cluster that starts life as a clone of the old one.

Step 3: The Switchover

Once the new cluster was healthy and the data was synced, the magic happened. I simply updated the ExternalName service to point to the new cluster.

apiVersion: v1
kind: Service
metadata:
  name: "pg-cluster-entrypoint"
spec:
  type: ExternalName
  # Update DNS to point to the NEW Postgres 17 cluster
  externalName: 'pg-cluster-17-rw.postgres.svc.cluster.local'
  ports:
    - name: postgres
      port: 5432
      protocol: TCP
      targetPort: postgres

My applications started resolving the DNS to the new IP, and the migration was complete. I verified everything was working, and then decommissioned the old cluster.

Conclusion

While upgrading databases in a Homelab doesn't carry the same financial risk as production, it’s a great playground to practice these "Advanced Level" patterns. Using an ExternalName service as a stable interface allowed me to swap the underlying infrastructure without having to reconfigure a single application.

The process was surprisingly smooth, and now I'm set with the latest Postgres and OS versions for years to come. Hope this helps anyone else planning their holiday upgrades!

Happy New Year 2025! 🎉