The road to v3

RunsOn v3 started with a forced decision.

On March 31, 2026, AWS announced that App Runner would move to maintenance mode ↗ and stop accepting new customers after April 30, 2026. Existing App Runner services keep running, but AWS does not plan to add new features.

RunsOn v2 used App Runner for the control plane. It was a good fit for a long time: cheap, managed, simple to deploy, and good enough to receive GitHub webhooks, process queues, and launch EC2 runners.

But once App Runner’s future changed, we had two options: do the smallest possible migration, or use the moment to clean up the stack properly.

We chose the second option.

v3 is not just v2 without App Runner#

RunsOn v3 replaces the App Runner control plane with the new Flex runtime on ECS/Fargate.

Flex is the current incarnation of RunsOn: ephemeral GitHub Actions runners, launched on demand in your AWS account, with flexible instance selection through workflow labels.

That label model is still the core of the product. A workflow can ask for x64 or arm64, more CPUs, more memory, private networking, custom volumes, GPU runners, spot capacity, on-demand capacity, or a specific runner shape. RunsOn resolves that request into EC2 capacity for the job, starts a fresh runner, and terminates it when the job is done.

That part stays. What changes in v3 is the control plane around it.

The public ingress path now uses API Gateway and Lambda. The runtime service runs on ECS/Fargate. The stack still launches fresh EC2 runners for GitHub Actions jobs, but the system is cleaner, easier to reason about, and no longer tied to a service entering maintenance mode.

It also gave us room to remove a lot of historical baggage.

CloudFormation gets simpler#

CloudFormation remains the default deployment path for RunsOn.

That matters. A lot of users choose RunsOn because they can install it quickly in their own AWS account without designing a runner platform from scratch. The default path should stay boring: deploy the stack, register the GitHub App, start running jobs.

Over time, the v2 CloudFormation template had accumulated too many knobs: legacy App Runner tuning, external networking paths, debug toggles, queue-age alarms, dashboard switches, default admins, runner disk defaults, and outputs that mostly existed for compatibility.

v3 cuts that down.

App sizing is now one shared preset: small, medium, high, or xhigh. The same idea applies across Flex, CloudFormation, and Terraform: pick the runtime size that matches your expected concurrency instead of tuning CPU, memory, and queue settings separately.

Daily minutes alarms are replaced with AppBudgetDailyUsd, defaulting to $10/day. Dashboard creation is built in. The embedded networking shape is fixed. Deprecated App Runner parameters and outputs are gone.

The public ingress story is also better. v3 adds an integrated managed WAF option for webhook ingress, including GitHub setup callback protection and admin route gating. It also lets you fully disable public admin routes when setup and dashboard access should not be exposed through public ingress.

CloudFormation should be the easy path. If you need deep infrastructure control, that belongs somewhere else.

Terraform gets the advanced path#

That somewhere else is Terraform, or OpenTofu.

In v3, module consumers should use the Flex module directly:

source = "runs-on/runs-on/aws//flex"

The Terraform internals have been reorganized around explicit runner and control-plane modules. That includes runner networking, runner extras, runner compute, the control-plane runtime, and Flex itself.

This split makes the intent clearer. CloudFormation is for the default product experience. Terraform is for teams that need to plug RunsOn into existing platform infrastructure: existing VPCs, GHES, IAM permission boundaries, custom app or runner policies, user-managed WAF behavior, and deeper validation in CI.

Both paths are supported. They just no longer pretend to solve the same problem.

What v3 improves for runners#

The biggest v3 work is architectural, but there are user-facing improvements too.

Runner lifecycle handling is better. Completed jobs clean up faster. Launch retries now use staged backoff. Manual reruns are safer. Active pool instances are protected from being terminated just because they are old.

There is also optional Bedrock support. When enabled, runner instances get the permissions needed for Bedrock-compatible AI agents, so tools like Claude Code or OpenCode can run inside GitHub Actions jobs using credentials from your own AWS account.

Observability improves as well: runner-side AWS profile setup, better OTEL job summaries, and support for selectively disabling OTEL logs or traces.

None of those changes would justify a major version on their own. Together with the control-plane migration and stack cleanup, they make v3 a much cleaner base for the next phase.

Flex is not the end of the story#

Flex is still the right model for most teams: define what a job needs through labels, let RunsOn resolve the best matching EC2 capacity, and pay only for the runners you actually use.

But we are also working on another product dedicated to running fleets at the organization or enterprise level.

That is a different problem. Flex is job-driven: a workflow asks for capacity, RunsOn launches it. Fleet-level orchestration is about managing larger pools of capacity across many teams, repositories, policies, and usage patterns.

v3 prepares the ground for that split. Flex gets simpler and sharper. The next product can focus on fleet operations without forcing that complexity into the default RunsOn install.

Migration is real#

v3 is a breaking release. Treat it like a migration, not a routine stack update.

For most existing users, the clean path is blue-green:

Deploy a fresh v3 stack.
Register the new GitHub App.
Test representative repositories.
Move traffic over.
Keep the old stack around briefly as a rollback point.
Delete it only once you trust the new setup.

That last step matters. Early v3 releases might be a bit bumpy. The point of keeping the old stack around is not ceremony; it gives you a simple rollback path while you confirm your workflows, labels, networking, permissions, cache behavior, and observability all work as expected.

If you had a heavily customized v2 CloudFormation stack, read the v3 migration guide first. Some of those settings are intentionally gone from CloudFormation and now belong in Terraform/OpenTofu.

The point of v3#

App Runner forced the timing.

But v3 is not just a reaction to App Runner.

It is the version where I want RunsOn to draw a cleaner line: simple defaults for most users, Terraform for advanced infrastructure teams, Flex for flexible per-job runner selection, and a separate path coming for larger fleet-level operations.

I also want to say thank you to everyone already running RunsOn today. You are the reason the product got this far, and the reason I can make these kinds of calls with real production feedback instead of guesses.

I hope you follow along for v3. The first releases may need a bit of sanding, and I am very open to feedback if a removed option turns out to be important for real-world setups.

But the bias for v3 is intentional: remove more, keep the core cleaner, and re-add specific options when there is a clear reason. That is better than carrying every old knob forever just because it once existed.