Skip to main content

Heading to PTS 2026

This is the 16th Perl Toolchain Summit. That number is remarkable in a way that’s easy to walk past — the Perl community has been gathering a small, focused group of toolchain maintainers in a room every single year since 2008, and the output has been disproportionate to the headcount. The Oslo Consensus in 2008 established how the CPAN toolchain would evolve. Lancaster in 2013 did the same for distribution metadata. Last year in Leipzig, the group shipped Test::CVE, prototyped MFA for PAUSE, cut Perl core runtime by 13%, and kept the next-generation CPAN client work moving forward.

Four days. A few dozen people. Consistently more output per person-hour than most engineering teams manage in a quarter.

I’m going to Vienna this week as a MetaCPAN infrastructure contributor, and I want to write down what we’re actually planning to do before I get there — partly to think it through, partly because I’ll want to compare the plan to reality when I write the wrapup post.

The ecosystem argument, briefly
#

Perl’s contributor base is shrinking. That’s just true. New account signups on PAUSE peaked around 450 a year in 2012 and dropped to 108 in 2025, of which only 65 made a first release. The people maintaining the ecosystem now are a smaller, more concentrated group carrying a larger fraction of the load.

But the ecosystem itself — 220,000+ CPAN modules across 45,500+ distributions — isn’t going anywhere. It’s the dependency graph for decades of production software: financial systems, bioinformatics pipelines (BioPerl had a hand in sequencing the human genome), sysadmin tooling that runs quietly in corners of infrastructure nobody’s touched since 2009. Booking.com trains 350+ engineers in Perl. The skill is scarce enough that Perl developers in the US average $110k — not because anyone’s writing new Perl enthusiastically, but because the old Perl is critical and there are fewer people who understand it every year.

MetaCPAN sits at the centre of this. It’s the primary discovery interface for all of CPAN — the web UI at metacpan.org and the free REST API that dozens of tools and services depend on. After the traditional CPAN mirror network consolidated onto Fastly CDN, MetaCPAN also became a first-class CPAN mirror, not just an index. It’s infrastructure now, not just a search engine.

Which is why getting its operational footing right matters, and why PTS is the right venue to make progress.

Moving off DigitalOcean
#

The biggest infrastructure story for this PTS is a platform migration. Olaf Alders is leading MetaCPAN off DigitalOcean and onto Hetzner, with Cloudfleet managing the Kubernetes layer.

DigitalOcean has been good to open-source projects in the past. The problem is the distinction between hosting an open-source project (a website, a repo, a small public tool) and hosting open-source infrastructure — something that functions as shared public utility and scales with use rather than with a paying customer’s growth. Their community support is geared toward small projects and simple hosting; when you’re running major shared infrastructure, the donation and discount model doesn’t scale with the scope of what you’re actually operating and isn’t flexible enough to bridge that gap.

Hetzner is a German cloud provider with a well-earned reputation for density and price-to-performance, especially on bare metal. Cloudfleet manages Kubernetes on top of Hetzner in a way that’s comparable to how a managed Kubernetes provider would operate — except on Hetzner hardware instead of AWS or GCP. Hetzner doesn’t offer a managed Kubernetes product, and for a volunteer organization operating on a best-efforts basis, running the Kubernetes control plane ourselves isn’t realistic — we need the cluster to self-heal without requiring a person to be paged. Cloudfleet fills that gap, and the Hetzner community tutorial on managed Kubernetes with Cloudfleet laid out exactly the model we needed.

What we get from the migration, at minimum: better economics at MetaCPAN’s scale, and a more coherent operational model where the Kubernetes layer is managed rather than hand-maintained. What I want to understand better before Sunday in Vienna: what we don’t know yet about running on this stack, and where the gaps are between the plan and what’s actually needed to keep metacpan.org up.

Getting secrets right
#

MetaCPAN’s current approach to secrets in Kubernetes is Sealed Secrets — you encrypt secrets with a cluster public key, commit the encrypted blobs to git, and the controller decrypts them at runtime. It’s a reasonable solution to the “don’t put plaintext secrets in git” problem. We’ve been using it for a while.

The friction shows up operationally. With multiple clusters, storing and managing a sealed secret per cluster gets complicated fast. The bigger issue is that Sealed Secrets requires the entire file to be encrypted — there’s no way to separate the bits that are genuinely secret from the bits that are just configuration. That makes it hard for developers to see what the configuration actually looks like, to track the history of changes, or to edit a value without going through the seal/unseal cycle. Secure keys sitting in a password manager are just easier to reason about than encrypted blobs in git.

We’re moving to External Secrets Operator backed by 1Password. The model is different in a meaningful way: secrets live in 1Password (where they belong — with access controls, audit trails, and rotation that most teams already manage there), and ESO syncs them into Kubernetes as native Secret objects at runtime. The git repo never sees a secret at all, not even an encrypted one. When you rotate a credential, you rotate it in 1Password, and the operator picks it up.

The practical wins: onboarding is easier (new contributors need 1Password access, not a sealing key), rotation is less ceremony, and the audit trail is in 1Password rather than inferred from git commit history. The migration itself is the work — identifying everything currently sealed, getting it into 1Password, wiring up the ESO resources, and validating that nothing breaks during the switchover.

Config drift and the ConfigMap generator
#

Alongside the secrets migration, there’s a quieter config management problem. MetaCPAN’s Kubernetes ConfigMaps are currently hand-authored YAML — someone edits the ConfigMap directly in the manifest, commits it, and hopes the change flows through correctly.

This is fine until it isn’t. Config drift happens when the source of truth for a configuration value is ambiguous. Is the YAML in git current? Was there a manual kubectl apply that didn’t get committed? Which environment has the canonical value?

We’re moving to Kustomize’s ConfigMap generator — a feature that generates ConfigMaps from source files rather than from inline YAML. The generator tracks changes properly, the source files have a clear home in the repo, and it integrates naturally with the Kustomize overlay structure we’re already using for environment-specific configuration. When you change a config value, you change the source file, the generator produces a new ConfigMap with a new hash-suffixed name, and deployments roll automatically.

It’s not a dramatic change, but “config changes are deterministic and tracked” is a property worth having.

DataDog as the new monitoring layer
#

MetaCPAN got a generous community donation that’s funding a DataDog deployment. I want to be honest about what this is replacing and why.

In Leipzig last year we worked on Loki for log aggregation and Thanos for long-term metrics storage. That work wasn’t wasted — logs and metrics are foundational. But Thanos is heavyweight and requires us to keep growing our own hosting footprint to support it. With a DataDog donation in hand, we can get APM, tracing, and infrastructure monitoring without adding more infrastructure to manage. Honestly, there’s a lot we’re currently blind on — the “don’t know what you don’t know” problem. We have some monitoring now but we’re lacking alerting, and stitching together a full observability stack across multiple systems is overhead a small volunteer team shouldn’t be spending time on when a funded alternative exists. The Leipzig work gave us the raw data; DataDog gives us the ability to ask questions of it without wiring up queries across multiple systems, and being funded means we can spend PTS wiring it in rather than debating the build-vs-buy question.

What “done” looks like by Sunday
#

My honest list for Vienna, roughly in order of confidence:

The ESO + 1Password migration is the thing I most want to get to a working state by Sunday. It’s well-defined enough to execute if we have the right people in the room. This is primarily my work, though Olaf and Ranguard will be collaborating on the broader migration and observability pieces while other developers work on moving additional Perl tools into the cluster.

The Kustomize ConfigMap generator work is smaller in scope and should be completable, assuming we’re not blocked by the secrets migration.

DataDog instrumentation can start during PTS even if it’s not fully deployed — getting the agent running and initial dashboards configured gives us something to iterate on remotely afterward.

The Hetzner/Cloudfleet migration is the one I’d put in the “get started, not done” category. Platform migrations of this scope don’t finish in four days, but PTS is the right place to make the architectural decisions that the rest of the migration depends on.

Olaf has already tested the new infrastructure on Cloudfleet/Hetzner, so the plan is to actually migrate off DigitalOcean during this summit — not prototype, migrate. After that, the work continues: shutting down remaining services still running on bare metal at other ISPs, getting the MetaCPAN API integrated into the Kubernetes cluster, and wiring up autoscaling that responds to real request load. Longer term, the goal is a standard hosting infrastructure that other Perl infrastructure projects can use — shared hosting capacity for the toolchain, not just MetaCPAN. Leipzig moved the pieces into place; Vienna is where we start making it real.

I’ll write the wrapup post from Vienna on Sunday or shortly after. The question I’ll be answering then is the same one from every PTS: how much of the plan actually happened, and what did we figure out that we didn’t know we needed to figure out?


Thanks to the sponsors making PTS 2026 possible: Deriv, Booking.com, MongoDB, Fastly, and the individual donors who keep the Perl toolchain funded.

Related

Use maxSkew: 2 with Kubernetes Topology Spread Constraints

maxSkew: 1 on a topologySpreadConstraints config looks like the obviously correct choice — maximum spread, tightest guarantee. We ran it that way in production until it caused a partial outage. Turns out maxSkew: 2 is almost always the safer default, and the difference only shows up in the failure case. The phantom domain problem With topologyKey: kubernetes.io/hostname and whenUnsatisfiable: DoNotSchedule, the Kubernetes scheduler counts every node registered in the API as a topology domain — including nodes that exist but can’t accept pods. A node that’s resource-exhausted but not tainted, or registered but not yet Ready, still participates in the skew calculation. Its count is 0.

Four days, eighteen missed sessions, and a private roundtable with Kelsey Hightower: SCALE 23x as it actually happened

The schedule I built two weeks ago was a fiction. A useful fiction — it forced real thinking about tradeoffs — but eighteen of the sessions I marked as “MUST” or “HIGH” are now links in a YouTube folder I won’t open before 2027. The one session that wasn’t on any schedule, wasn’t announced publicly, and had no recording? That one I can still reconstruct line by line. That’s the gap between the conference you plan and the conference you actually attend.

Four days, 277 sessions, one brutal Sunday time slot: scheduling SCALE 23x as a platform team manager

There are 277 sessions at SCALE 23x this year. I know this because I extracted all of them from the schedule webarchive files and scored every single one. I’m not proud of how long this took. But it surfaced some genuinely interesting tradeoffs — and the pattern of what conflicted with what tells you something real about where platform engineering is right now. The scheduling problem is different when you manage a team # When I was an IC, conference scheduling was mostly about depth. Find the three talks that will blow your mind and plan the rest around them. Everything else is hallway track.