Skip to content

About RunsOn

Author πŸ‘‹

Cyril Rohr (France)

Why RunsOn?

I like GitHub Action, but:

  • some longer workflows (>5min) would merit faster runners.
  • running out of free minutes can get expensive pretty quick.

Current landscape:

  • official runners: great for simple jobs, but can be costly. Larger runners are both very costly and slow. Good concurrency for standard runners.

  • third-party runners (buildjet, warpbuild, ubicloud, etc.): support official images, but you have to trust 3rd-party (they’re not certified). Most of them only 50% cheaper, and/or slow as well. Not flexible in terms of hardware, image. Concurrency can be pricey / not available.

  • (artisanal) self-hosted runners: tradeoff between cheap <> max concurrency, maintenance, potential leakage between workflows, security issue for public repos.

  • (productized) self-hosted runners with ARC: heavy, require some expertise with k8s, no official image, ~flexibility in terms of hardware. Relevant if using autoscaled pods, otherwise can be costly.

What I wanted

Core features

  • cheap!
  • fast hardware
  • 1-1 workflow compatibility with existing github actions
  • fast boot
  • infinite concurrency if I want to
  • on-premise, don’t want to share sensitive secrets with 3rd-party
  • good network throughput for all those downloads / uploads

Nice to have

  • fast caches
  • one-click fire and forget install
  • ability to use a specific base image, to preload software, precompilations, etc.

Solution

GitHub Webhook -> RunsOn -> EC2

Architecture

For now: no warm pool or clever shenanigans, stay with the most stupid thing that could work, and see how far that can go:

  • instances auto-terminate when job finishes, even if RunsOn app is down, so no risk of overage.
  • cloudwatch integration, for graphing consumed minutes (soon: cloudwatch monitoring for CPU/RAM usage).
  • huge: integrated S3 cache (with VPC S3 gateway, so free traffic) => UNLIMITED cache. Can also be used to cache docker layers.

Timings:

  • from GitHub to RunsOn receiving the webhook: 1-3s delay.
  • from RunsOn to Launching instance: ~5s (instance type selection, ami selection, runner registration with github).
  • From Launching to Starting: ~15s (boot, pull AMI, etc.)
  • From Starting to Accepting workflow job: ~10s (network, cloud-init, runner binary init + sync with github).

All-in: from 30 to 50s depending on underlying AWS load. Hard to improve upon, unless mix with warm pools of machine.

Other third-parties: anywhere from 10s (github) to multiple minutes (github!).