Skip to main content

PTS 2026: What Actually Happened

Saturday morning in Vienna. We were intending a 10K — a good way to shake off four days of sitting in a room staring at manifests. We took a wrong turn somewhere around the Prater, failed to correct it, and finished 14K instead. Nobody was angry about it. The extra kilometres took us through streets we wouldn’t have found otherwise, past the football stadium and through a neighbourhood we had no particular reason to be in. Finishing tired is still finishing.

That’s a fair summary of the week too. I came in with four things I wanted to tackle. Three shipped cleanly. One turned out to be ten.


tl;dr

  • The hz cluster is live: Hetzner/Cloudfleet parallel production environment running all seven MetaCPAN apps, with resource requests tuned from real Datadog P95 data and secrets sourced from 1Password via External Secrets Operator on day one — never touched SealedSecrets.
  • The venue (die Hauswirtschaft) was genuinely excellent. The best coffee of the trip was at Balthasar. There was a one-hour walk for Apfelstrudel at Café Sperl that was completely worth it.
  • Four focused, uninterrupted days produced ten shipped items from a list of four. The room does something that four months of async doesn’t — not because async is broken, but because some decisions need zero latency and no context switching.
  • Sunday afternoon: people beginning to leave, goodbyes that are quick because everyone has flights. The week becomes past tense before it’s actually over.

The Work
#

The pre-event post had the full list: ESO migration, ConfigMap generator, Datadog instrumentation, and the Hetzner/Cloudfleet migration. The migration was the one I’d classified as “get started, not done.” That classification was wrong in an instructive way.

The SealedSecrets Migration
#

The friction with Sealed Secrets is hard to explain until you’ve actually operated it across multiple clusters. The model is sensible — encrypt a secret with the cluster’s public key, commit the encrypted blob to git, the controller decrypts it at runtime. The problem surfaces when you have two clusters, because you have two public keys, which means two encrypted blobs per secret, which means every secret rotation is a ceremony across two environments. And because the entire file is encrypted, you can’t look at a SealedSecret and understand anything about the shape of the configuration — the non-secret bits are as opaque as the secret bits. It’s secure in the sense that nobody can read your API key, but it’s also secure in the sense that nobody can read the field name next to your API key.

We moved to External Secrets Operator backed by 1Password. The model is fundamentally different: secrets live in 1Password with access controls and audit trails you’d want anyway, and ESO syncs them into Kubernetes as native Secret objects at runtime. The git repo never sees a secret, not even an encrypted one. Rotating a credential means rotating it in 1Password; the operator picks it up.

The migration itself was Thursday’s main story. Every SealedSecret across the platform, migrated in sequence: cert-manager (Cloudflare DNS credentials), ArgoCD (GitHub OAuth), loki, kube-prometheus, kube-thanos, and then the application secrets for web, api, test-smoke, and backpan-syncer.

The debugging was real. Three consecutive fixes to get the 1Password SDK remoteRef format right — the provider accepts an op:// URI, but the format for item and field references isn’t exactly what the documentation implies, and the singular-field edge case behaves differently from the multi-field case. Then a separate issue: once the ArgoCD GitHub OAuth secret was live from ESO, the repo access credentials needed to switch to GitHub App authentication. That one cost a couple of hours.

Once all five app ExternalSecrets were proven on the DO cluster, they got lifted from environments/prod/ into each app’s base/, with the namespace injected by the Kustomize overlay. This is the part that matters structurally: the DO kustomize build output was byte-identical before and after, which meant I could verify the refactor hadn’t changed anything before using it as the foundation for the hz overlays.

The notable outcome: The hz cluster launched without ever having SealedSecrets in it. Secrets were live from 1Password from day one. The placeholder phase — where the hz environment scaffold had NOT FUNCTIONAL stub secrets — lasted less than 24 hours.

Standing Up the hz Cluster
#

The week’s centrepiece was a full parallel production stack on Hetzner (prod-hz), duplicating all seven MetaCPAN apps alongside the existing DigitalOcean cluster. Blue/green arrangement: DO keeps serving live traffic, hz runs as a validated shadow until cutover.

This is different from “copy the prod config and adjust the endpoint.” Every app got a proper environments/prod-hz/ Kustomize overlay and a separate ArgoCD AppProject and Application targeting the hz cluster’s ArgoCD instance. The platform layer — ArgoCD itself, the Hetzner CSI driver, CloudNative PG operator, and the CloudFleet node autoprovisioner — was installed via a consistent vendored Helm pattern: Makefile + values.yaml + vendor/ directory checked into git, so the installation is reproducible without pulling from the internet at apply time.

Resource requests from real data. Rather than copying prod’s resource requests and hoping, I wrote a Python script that queried Datadog for P95 CPU and memory usage across all workloads and generated resource requests from that. The hz cluster launched with tuned numbers from the start. There’s a scripts/README.md documenting the methodology so it can be repeated as traffic patterns change — because traffic patterns do change.

Geographic constraints baked in. All hz workloads are pinned to Germany via nodeAffinity requiring topology.kubernetes.io/region In [fsn1, nbg1] (Frankfurt and Nuremberg). This matches the CloudFleet NodePool restriction and ensures nothing accidentally schedules outside the expected region. Web gets additional topologySpreadConstraints spreading pods evenly across availability zones within Germany. backpan-syncer, grep, and test-smoke are pinned specifically to Frankfurt where the persistent storage lives.

The CloudFleet/Hetzner choice deserves a sentence. Hetzner has excellent price-to-performance, especially compared to what MetaCPAN’s current DigitalOcean footprint costs. Hetzner doesn’t offer managed Kubernetes, which matters for a volunteer organization — running the control plane ourselves means someone gets paged when it breaks, and that person has a day job. CloudFleet handles the Kubernetes layer the way a managed provider would, on Hetzner hardware instead of AWS.

The Unglamorous Remainder
#

These are the items that are always in the backlog, that never get scheduled, until you have four focused days in a room:

PodDisruptionBudgets on web. Without a PDB, Kubernetes can evict all web pods simultaneously during a node drain or cluster upgrade. Added maxUnavailable: 1 to both web and web-search deployments in the base. This guarantees at least two pods remain serving traffic during voluntary disruptions regardless of cluster or environment. It should have been there already.

ArgoCD noise reduction. Two persistent pain points, both resolved globally rather than per-application. ServerSideDiff set as a cluster-wide default — eliminates spurious diffs in the ArgoCD UI caused by ExternalSecret operator-injected fields that aren’t in the git manifests. And ignoreDifferences rules for four resource types that were causing perpetual OutOfSync: PVC fields that Kubernetes fills in after binding, eight ESO webhook-injected fields on ExternalSecret, Karpenter CRD defaults on NodePool, and DatadogAgent fields not yet in the installed operator CRD version. Both fixes go in the shared ArgoCD config patch, covering both clusters.

CI autodiscovery. The validate-manifests workflow had a hardcoded list of Kustomize overlays to build and validate. It had drifted in both directions: three deleted overlays still listed (causing failures), and six new overlays missing from validation (silently skipped). Replaced with find-based discovery — platform overlays are directories named do or hz at depth 2 under platform/, app overlays are directories under apps/<app>/environments/<env>. Adding or removing a component no longer requires touching the workflow.

Legacy monitoring stack removed. loki, kube-prometheus, kube-thanos, and vector-agent definition files removed from the repository. The old self-hosted stack is retired in favour of Datadog. This also unblocked the CI fix above — the deleted directories were causing kustomize build failures.

The ConfigMap generator work from the pre-event plan got deprioritised — the ESO migration turned out to be larger than expected and the sequencing meant it made more sense to complete that foundation first.

Image sync. The set-image workflow that runs on image automation was updated to update both prod and prod-hz in the same commit. Previously only prod was updated; prod-hz would have drifted immediately after launch.


The Place
#

die Hauswirtschaft
#

The venue was die Hauswirtschaft, and I want to be specific about why it worked: it’s in a neighbourhood worth being in. Not a conference centre surrounded by parking lots. An actual street, with actual things on it, that you’d walk through for reasons unrelated to attending a tech event.

The food was organized and genuinely good. Friday’s lunch was a vegan red lentil dal with rice — not a concession to dietary needs, just a well-made dish. That’s a higher bar than most tech event catering manages. The staff were accommodating in the way that only happens when a venue actually wants you there rather than tolerating you.

What We Ate
#

Pre-trip research had flagged Balthasar and Mochi Ramen Bar. Both delivered.

Balthasar was the best coffee of the trip. I lost my AirFly on the airplane (left it plugged into the seat on the Frankfurt connection), so Wednesday afternoon was an unexpected detour to the Apple Store followed by a proper stop at Balthasar — Schwedischer Knopf and a coffee that made the AirFly replacement feel worth it.

Mochi Ramen Bar appeared twice. Both times excellent. On cool days in a city you don’t quite know yet, a bowl of ramen is its own kind of orientation.

Café Sperl. Olaf had seen it on a PBS special and was determined to go. So Friday afternoon became a one-hour walk across the city for Apfelstrudel. Café Sperl is the kind of place that exists because Vienna decided that sitting down with coffee and something pastry-adjacent deserves an institution. The walk was completely worth it. The Apfelstrudel was completely worth it.

Geier was breakfast both mornings I went — BIO Dinkelseele on Saturday, and then Sunday the Mohnkrone mit Waldviertler Graumohn, which was the best pastry of the trip. The Waldviertler Graumohn is a grey poppy seed variety grown in the region — earthier and less sweet than the black poppy filling you’d get elsewhere — and the Mohnkrone carried enough of it that the flavour held through to the last bite rather than being decoration.

The attendees dinner was at Stuwer — I went Austrian: white asparagus soup, schnitzel, Kaiserschmarrn. No notes.

Pho Mi handled group dinner on Thursday — good noodles, accommodated the group without fuss.

The Runs
#

5K Thursday morning to get ready for the day. Saturday we set out for 10K, took a wrong turn somewhere around the Prater, and finished 14K — past the football stadium, through the neighbourhood we had no particular reason to be in. You finish tired and also more located than you started.


The People
#

What four days in a room produces is different from four months of async — not categorically better, because async is how this project runs and it works. But some decisions need a feedback loop that’s measured in seconds, not hours. In a room, someone says “does this approach break anything on your side?” and gets an answer before they’ve finished the sentence. On Slack, that’s a thread that goes quiet and reopens the next morning with a question about what was meant.

Olaf Alders led the Hetzner migration and is the reason the hz cluster exists as a coherent thing rather than a pile of manifests. He also initiated the Café Sperl expedition. Ranguard was working on the broader platform bootstrap — the ArgoCD ignoreDifferences configuration and the platform-level sync policy decisions that needed to hold consistently across both clusters. Other MetaCPAN developers were in parallel making progress on moving additional Perl tools into the cluster — the kind of parallel progress that only happens when everyone’s in the same room with no calendar conflicts.

The clearest example of why the room mattered: during the ESO migration debugging, while I was wiring up the hz cluster’s ExternalSecrets, Olaf and Ranguard were able to verify the DO cluster’s behaviour in real time — checking whether a config change on one side would affect the other, and confirming which environment had what state at any given moment. The blue/green architecture meant decisions about which cluster got which config had to be made carefully and in sequence, and the coordination latency would have been brutal over Slack. There was also a moment early in the week where Olaf’s existing familiarity with the Hetzner setup — specifically the CloudFleet instance pool configuration he’d been living with — meant we could validate the geographic constraints and instance sizing decisions on the spot rather than discovering a misalignment days later in a code review.

Sunday started slower. People began leaving. The event wound down the way these events always do — not all at once, but in a way you feel before you can see it. There’s a specific kind of exhaustion at the end of a summit like this: tired from the work, tired from the socializing, tired from being somewhere unfamiliar. And also satisfied in a way that’s hard to point at precisely.


Sunday
#

Slower morning. A random coffee shop near the hotel — just okay, nothing notable. Brief stop to get bags. Then back to Geier for the Mohnkrone, because some things are worth repeating.

Dinner that evening was at Indus — excellent Indian, and it had been long enough since I’d had any that the meal felt like a proper rediscovery. Then a bit further on foot to the Marriott Garten Café for cheesecake: the St-Germain Marille Cheesecake — elderflower and apricot, winner of the hotel’s April tasting, and quite good. Vienna to the end.

Saying goodbye at a summit like this is its own thing. Some of these people you see once a year, if that. There’s a specific rhythm to it — you’re mid-conversation and suddenly aware that everyone has a flight or a train and the logistics of leaving a city that isn’t yours. The goodbyes are brief not because there isn’t more to say, but because the logistics don’t leave room for it. And then you’re in an airport or on a train and the week is already becoming a thing that happened rather than a thing that’s happening.

I came in with four goals. I left with ten shipped items. The week was bigger than planned — not because the plan was wrong, but because the room contained more information than the plan did, and we used it.

The extra distance is the part you remember.


This was the wrapup I promised in the pre-event post. PTS 2026 was held April 23–27 in Vienna. Thanks to the sponsors making it possible: Deriv, Booking.com, MongoDB, Fastly, and the individual donors who keep the Perl toolchain funded.

Related

Heading to PTS 2026

This is the 16th Perl Toolchain Summit. That number is remarkable in a way that’s easy to walk past — the Perl community has been gathering a small, focused group of toolchain maintainers in a room every single year since 2008, and the output has been disproportionate to the headcount. The Oslo Consensus in 2008 established how the CPAN toolchain would evolve. Lancaster in 2013 did the same for distribution metadata. Last year in Leipzig, the group shipped Test::CVE, prototyped MFA for PAUSE, cut Perl core runtime by 13%, and kept the next-generation CPAN client work moving forward.

Use maxSkew: 2 with Kubernetes Topology Spread Constraints

maxSkew: 1 on a topologySpreadConstraints config looks like the obviously correct choice — maximum spread, tightest guarantee. We ran it that way in production until it caused a partial outage. Turns out maxSkew: 2 is almost always the safer default, and the difference only shows up in the failure case. The phantom domain problem With topologyKey: kubernetes.io/hostname and whenUnsatisfiable: DoNotSchedule, the Kubernetes scheduler counts every node registered in the API as a topology domain — including nodes that exist but can’t accept pods. A node that’s resource-exhausted but not tainted, or registered but not yet Ready, still participates in the skew calculation. Its count is 0.

Four days, eighteen missed sessions, and a private roundtable with Kelsey Hightower: SCALE 23x as it actually happened

The schedule I built two weeks ago was a fiction. A useful fiction — it forced real thinking about tradeoffs — but eighteen of the sessions I marked as “MUST” or “HIGH” are now links in a YouTube folder I won’t open before 2027. The one session that wasn’t on any schedule, wasn’t announced publicly, and had no recording? That one I can still reconstruct line by line. That’s the gap between the conference you plan and the conference you actually attend.