About
Connect with me: cyrilrohr.com - linkedin.com/in/cyrilrohr - x.com/crohr
Iβve been making Dev/Ops related products since 2013. Some of my projects:
- PullPreview.com β (2018) - Full-fidelity preview environments for GitHub pull requests.
- Packager.io β (2013) - Linux DEB/RPM packaging for modern web applications.
How it came about
As part of my freelance jobs at various companies, Iβve implemented a lot of GitHub Actions workflows. I like GitHub Actions a lot, but:
- some longer workflows (>5min) would merit faster runners.
- running out of free minutes can get expensive pretty quick.
- a lot of interesting features (static IPs, arm runners, etc.) are sometimes only available with higher-tier plans.
Current landscape:
-
official runners: great for simple jobs, but can be costly. Larger runners are both very costly and slow. Good concurrency for standard runners.
-
third-party runners (buildjet, warpbuild, ubicloud, etc.): support official images, but you have to trust 3rd-party (theyβre not certified). Most of them only 50% cheaper, and/or slow as well. Not flexible in terms of hardware, image. Concurrency can be pricey / not available.
-
(artisanal) self-hosted runners: tradeoff between cheap <> max concurrency, maintenance, potential leakage between workflows, security issue for public repos.
-
(productized) self-hosted runners with ARC: heavy, require some expertise with k8s, no official image, ~flexibility in terms of hardware. Relevant if using autoscaled pods, otherwise can be costly.
What I wanted
Core features
- cheap!
- fast hardware
- 1-1 workflow compatibility with existing github actions
- fast boot
- infinite concurrency if I want to
- on-premise, donβt want to share sensitive secrets with 3rd-party
- good network throughput for all those downloads / uploads
Nice to have
- fast caches
- one-click fire and forget install
- ability to use a specific base image, to preload software, precompilations, etc.
Solution
- β official runners (expensive, and slow)
- β third-party runners (3rd party, lack of concurrency, can be slow (network and/or hardware))
- β (artisanal) self-hosted (maintenance, lack of concurrency, lack of image) (but can be great!)
- β (productized) self-hosted (maintenance, lack of image, manual config of app credentials)
- β RunsOn - KISS. Faster, 10x cheaper:
Keep it stupid simple
For now: no warm pool or clever shenanigans, stay with the most stupid thing that could work, and see how far that can go:
- instances auto-terminate when job finishes, even if RunsOn app is down, so no risk of overage.
- cloudwatch integration, for graphing consumed minutes + promotheus metrics.
- huge: integrated S3 cache (with VPC S3 gateway, so free traffic) => UNLIMITED cache. Can also be used to cache docker layers.
Timings:
- from GitHub to RunsOn receiving the webhook: 1-3s delay.
- from RunsOn to Launching instance: ~5s (instance type selection, ami selection, runner registration with github).
- From Launching to Starting: ~15s (boot, pull AMI, etc.)
- From Starting to Accepting workflow job: ~10s (network, cloud-init, runner binary init + sync with github).
All-in: from 30 to 40s depending on underlying AWS load. Hard to improve upon, unless mix with warm pools of machine.
Other third-parties: anywhere from 10s (github) to multiple minutes (github!).
Surprisingly much better than a solution based on e.g. a stock installation of the semi-officially supported Action Runner Controller β or the Philips β Terraform project.
The code
The code is partly open-source, and available on Github β. Full source code is available with a Sponsorship license.
Thanks
Thanks to the early adopters, and especially Alan β for being the first sponsor of this project and helping me uncover and fix the initial scalability issues.
Usage
RunsOn is now used across tens of companies, big and small, and is powering builds for millions of GitHub Actions workflows per month, well ahead of many other third-party providers. Check it out!