self-host →

Rate limits and scale

How RunsOn paces AWS and GitHub API calls to protect shared quotas at high volume, what you can tune, and the AWS quotas you must raise yourself.

At tens of thousands of jobs per day, two sets of limits come into play: your AWS account’s EC2 API throttling and service quotas, and GitHub’s API rate limits. RunsOn paces itself to stay within both, so a busy stack does not exhaust quotas shared with the rest of your account. This page covers what RunsOn does automatically, the few knobs you can turn, and the quotas only you can raise.

What RunsOn does automatically

You do not configure any of this — it is built in.

  • Paces every AWS API call. EC2 launch, terminate, start/stop, and CreateTags calls each go through a token-bucket limiter before hitting AWS. Defaults are intentionally low (for example, ~2 RunInstances/CreateFleet calls per second) because RunsOn cannot know how many other stacks or tools share the account’s EC2 API quota. The launch rate scales up automatically with the AppSize preset.
  • Meters GitHub API usage against the secondary rate limit. Calls are budgeted as points — 1 for reads (GET), 5 for writes (POST/PATCH/PUT/DELETE) — and held just under GitHub’s 15-points/second ceiling. Each GitHub App installation gets its own budget, and critical-path calls (like generating a runner’s registration token) take priority over background cleanup.
  • Retries transient failures. GitHub 5xx responses are retried with backoff, and GitHub secondary-rate-limit responses honor the Retry-After header within the runner’s boot window. EC2 throttling (RequestLimitExceeded) is retried by the AWS SDK on top of the pacing above.
  • Falls back when capacity runs out. If a spot launch hits your account’s spot quota (MaxSpotInstanceCountExceeded), RunsOn raises an alert, briefly snoozes spot, and launches on-demand instead so jobs still run. See Spot pricing.

What you can tune

AppSize — the main throughput lever

AppSize (CloudFormation) / app_size (Terraform) sets the control-plane worker concurrency, the ECS task CPU/memory, and the EC2 launch rate-limit assumptions together. It is the single knob for overall throughput. Current presets:

PresetWebhook workersProvisioning workersRegistration workers
small442
medium884
high202010
xhigh404020

AppGithubApiStrategy

Set AppGithubApiStrategy to conservative to spend fewer GitHub API tokens — RunsOn stops eagerly de-registering finished runners (GitHub reaps them within 24h anyway). Recommended once you launch tens of thousands of jobs per day.

Advanced concurrency overrides

The per-stage worker counts can be overridden independently of AppSize via the RUNS_ON_APP_WEBHOOK_CONCURRENCY, RUNS_ON_APP_PROVISIONING_CONCURRENCY, and RUNS_ON_APP_REGISTRATION_CONCURRENCY environment variables (passed through extra_env_vars on the Terraform module). Prefer AppSize unless you have a specific bottleneck to target.

What you must do in your AWS account

RunsOn cannot raise your account’s quotas — these are on you, and they are the usual cause of “launches are slow” or “jobs fall back to on-demand”:

Symptoms and where to look

SymptomLikely causeWhere to look
RequestLimitExceeded in logs, bursts launch slowlyEC2 API request-rate quota too low for the AppSizeTroubleshooting → RequestLimitExceeded
Jobs run on-demand when you expected spotSpot quota hit (MaxSpotInstanceCountExceeded)SNS alerts, Spot pricing
Queue depth climbing under loadWorker concurrency saturatedRaise AppSize; CloudWatch dashboard
GitHub API errors at very high volumeSecondary rate limitSet AppGithubApiStrategy=conservative