Skip to content

Blog

GitHub Actions are slow and expensive, what are the alternatives?

Note: this is a recording of a similar talk given for a DevOps meetup on May 16, Rennes, France. You’ll find a generated transcript summary below, but you probably want to watch the video instead.

Introduction

Hello everyone and thanks for coming to this presentation on GitHub Actions and how to make it faster and 10x cheaper. But first a brief primer on GitHub Actions and especially the good parts.

GitHub Actions: The Good Parts

GitHub Actions is a way to run workflows automatically whenever you push code or open a pull request or do anything on your repository.

It has very high adoption, a flexible workflow syntax, and a large choice of architectures so you can run workflows targeted at Linux x64, macOS, Windows, so it’s quite versatile and really useful.

GitHub Actions: The Bad Parts

Here are some major issues with GitHub Actions:

  • Performance and Cost: The default runners on GitHub are pretty weak, sporting just two cores that are both slow and expensive, costing over $300 a month if used non-stop. On the other hand, alternatives like Buildjet, Warpbuild, and UbiCloud offer quicker and cheaper services.

  • Caching and Compatibility Issues: GitHub’s caching tops out at 100MB/s, which can bog down workflows involving large files. Also, there’s no full support for ARM64 runners yet —- they’re still in beta —- slowing down builds that need multiple architectures.

  • Resource Optimization and Time Waste: GitHub’s weaker machines mean you often have to spend a lot of time fine-tuning your test suites to get decent run times. This eats up a lot of engineering hours that could be saved by switching to more robust runners from other providers or by setting up your own.

Solution: Self-Hosted Runners

Self-hosted runners offer a practical solution for those looking to speed up their builds and reduce costs. By setting up your own machines and configuring them with GitHub’s runner agent, you can achieve faster build times at a lower price.

When using non-official runners, you can choose among 3 levels:

  • artisanal on-premise
  • productized on-premise
  • third-party providers

Artisanal on-premise

This approach, which I’ll call ‘artisanal on-premise’, involves using a few of your own servers and register them with GitHub. It’s cost-effective and manageable for a small number of machines but has limitations such as limited concurrency, maintenance requirements, security risks, and lack of environment consistency with GitHub’s official runners.

Productized on-premise

For a more robust setup, consider the ‘productized on-premise’ approach. This involves similar self-hosting principles but requires additional software like the Action Runner Controller or the Philips Terraform project to help manage the runners. This setup offers better hardware flexibility and scalability, as it can dynamically adjust the number of virtual machines based on demand. However, it requires more expertise to maintain and still lacks full image compatibility with GitHub’s official runners, necessitating custom Docker images or AMIs.

Third-Party Providers

The final option is to use third-party providers for more affordable machines. These providers handle maintenance, so you just pay for the service. Most support official images, and they typically offer a 50% cost reduction. However, using these services means you’ll need to share your repository content and secrets, which could be exposed if there’s a security breach. The hardware options are limited; you can choose the number of CPUs but not specific details like the processor type, disk space, or GPU. Additionally, if you need more than 64 CPUs concurrently, extra fees may apply. Often, these services are hosted in locations with suboptimal network speeds.

Market Overview

Here’s a quick overview of the market options for GitHub Actions alternatives:

  • Third-Party SaaS: There’s a wide variety of third-party services available, with new options emerging almost monthly.
  • Fully On-Premise: Options include the Action Runner Controller and the Terraform provider. AWS CodeBuild is a newer addition that allows you to run managed runners within your AWS infrastructure.
  • Hybrid Providers: These offer a mix of on-premise and SaaS solutions. You provide the hardware hosted in your infrastructure, but management is handled through their control plane.

While searching for a cost-effective and efficient self-hosted solution, I found the fully on-premise options challenging to set up, slow to start, and with lengthy queuing times. Additionally, AWS CodeBuild, despite its advantages, is costly and comes with its own set of limitations.

Introducing RunsOn

I’ve been developing RunsOn, a new software aimed at creating a more affordable and efficient on-premise GitHub Action Runner. Here’s a quick rundown:

  • Accessibility: RunsOn is free for individual and non-commercial use, with paid licenses available for commercial projects.
  • Core Features Desired:
    • Speed and Cost Efficiency: I aimed for faster and cheaper runners.
    • Scalability: Ability to handle numerous jobs concurrently without limitations.
    • Compatibility: Seamless integration with existing GitHub workflows by ensuring full image compatibility with official runners.
    • Low Maintenance: Minimal engineering oversight required, automating most of the operational aspects.
  • Additional Nice-to-Have Features:
    • Flexible instance type selection (CPU, RAM, disk, GPU).
    • Support for both x64 and arm64 architectures, and potentially macOS.
    • Enhanced handling of caches and Docker layers.
    • Easy installation and upgrades.

Overall, the goal is to make RunsOn a robust, user-friendly solution that enhances the efficiency of running automated workflows.

Core Features

  • Speed: To enhance speed, select providers with superior CPUs like Hetzner, AWS, or OVH. RunsOn uses AWS for its diverse instance choices and spot pricing, scoring 3,000 in the CPU PassMark benchmark.
  • Cost Efficiency: For cost savings, consider using Hetzner for artisanal setups or AWS EC2 spot instances for productized solutions. Spot instances can be up to 75% cheaper than on-demand prices, fitting well with the short-lived nature of most CI jobs. Utilize the Create Fleet API from EC2 to minimize spot interruptions by selecting instances from the least interrupted pools.

Scalability and Compatibility

Key points on scalability for RunsOn:

  • Simple and Dynamic: RunsOn launches an ephemeral EC2 instance for each job, which is terminated after the job completes. This approach keeps the system simple and responsive.
  • Concurrency Limits: The only limits to how many jobs you can run concurrently are your AWS quotas.
  • Optimized Queuing Times: By optimizing base AMIs and using provisioned network throughput for EBS, RunsOn achieves queuing times of around 30 seconds. This is competitive with GitHub’s 12 seconds and better than many third-party providers.
  • Stable Performance Under Load: Extensive testing with clients, such as Alan, shows that even with bursts of 100 or more jobs, the queuing times remain stable.

Compatibility with Official Runners

So basically, I wanted to do just this: change one line, and my workflow should still work. This is probably one of the hardest parts because you have to make compatible OS images, in my case for EC2, and nobody did this, or nobody published it at least.

So in my case, thankfully, GitHub publishes the Packer templates for their Runner images on Azure, so I just ported them for AWS, and this is now available for anyone to use. You can find the links here.

Low Maintenance

The final feature, low maintenance, and so as you can see, the architecture diagram has changed a bit since the last slide, but basically what I use for RunsOn is just managed services everywhere, and cheap services. So I have basically one CloudFormation stack which provisions an SQS queue, an SNS alert topic, a CloudWatch logs and metrics, and some S3 buckets, and then the RunsOn server is running on the AppRunner AWS service, which is really a cheap way to run containers on AWS. I recommend you check that out, and yeah, on the VM there is a small RunsOn agent that launches to configure the VM and then register with GitHub, and all that stack, like if you have a reasonable number of jobs, it costs only about one or two dollars a month, which is pretty impressive.

Nice-to-Have Features

Here’s a quick overview of the additional features and real-world results of RunsOn:

  • Flexible Instance Selection: Users can customize their VM specifications such as CPU, RAM, and disk space directly in their workflows.
  • Architecture Support: RunsOn supports both ARM64 and AMD64 architectures. macOS support is currently unavailable due to licensing constraints.
  • Easy Installation and Upgrades: RunsOn offers a one-click installation and upgrade process using a template URL.
  • Enhanced Caching: By leveraging AWS’s S3, RunsOn provides up to five times faster caching and supports caching Docker layers.
  • Custom Images and Full SSH Access: Users can preload software on custom AMIs and have full SSH access to their Runners, with options for private networking and static EIPs.
  • Real-World Impact: RunsOn has significantly reduced costs and increased speed for clients, handling up to 500,000 jobs across various users, from small to large-scale operations.

Future Work

Future enhancements for RunsOn include:

  • Cost Transparency: We plan to make CI costs more visible to developers to highlight the financial and environmental impacts of running multiple jobs.
  • Efficiency Monitoring: Introducing reports to help determine if your Runners are sized appropriately, ensuring you’re not overpaying for unused resources.

Conclusion

  • RunsOn has gained significant traction, attracting many customers and interest from competitors.
  • It stands out as a top fully on-premise solution.
  • Explore more at runs-on.com and check out github.com/runs-on/runs-on.

How to verify that VPC traffic to S3 is going through your S3 gateway?

Gateway endpoints for Amazon S3 are a must-have whenever your EC2 instances send and receive traffic from S3, because they allow the traffic to stay within the AWS network, hence better security, bandwidth, throughput, and costs. They can easily be created, and added to your VPC route tables.

But how do you verify that traffic is indeed going through the S3 gateway, and not crossing the outer internet?

Using traceroute, you can probe the routes and see whether you are directly hitting the S3 servers (i.e. no intermediate gateway). In this example, the instance is running from a VPC located in us-east-1:

Terminal window
$ traceroute -n -T -p 443 s3.us-east-1.amazonaws.com
traceroute to s3.us-east-1.amazonaws.com (52.216.215.72), 30 hops max, 60 byte packets
1 * * *
2 * * *
3 * * *
4 * * *
5 * * *
6 52.216.215.72 0.890 ms 0.916 ms 0.892 ms
Terminal window
$ traceroute -n -T -p 443 s3.amazonaws.com
traceroute to s3.amazonaws.com (52.217.139.232), 30 hops max, 60 byte packets
1 * * *
2 * * *
3 * * *
4 * * *
5 * * *
6 52.217.139.232 0.268 ms 0.275 ms 0.252 ms

Both outputs produce the expected result, i.e. no intermediary gateway. This is what would happen if you were accessing a bucket located in the us-east-1 region.

Let’s see what happens if we try to access an S3 endpoint located in another zone:

Terminal window
$ traceroute -n -T -p 443 s3.eu-west-1.amazonaws.com
traceroute to s3.eu-west-1.amazonaws.com (52.218.25.211), 30 hops max, 60 byte packets
1 * * *
2 240.4.88.37 0.275 ms 240.0.52.64 0.265 ms 240.4.88.39 0.215 ms
3 240.4.88.49 0.205 ms 240.4.88.53 0.231 ms 240.4.88.51 0.206 ms
4 100.100.8.118 1.369 ms 100.100.6.96 0.648 ms 240.0.52.57 0.233 ms
5 240.0.228.5 0.326 ms * *
6 240.0.32.16 0.371 ms 240.0.48.30 0.362 ms *
7 * 240.0.228.31 0.251 ms *
8 * * *
9 * * 240.0.32.27 0.392 ms
10 * * *
11 * 242.0.154.49 1.321 ms *
12 * * 52.93.28.131 1.491 ms
13 * * 100.100.6.108 1.286 ms
14 100.92.212.7 67.909 ms 52.218.25.211 67.356 ms 67.929 ms

As you can see, the route is completely different, and as expected does not hit straight to the S3 endpoint.

TL;DR: make sure your route tables are correct, and only point to S3 buckets located in the same region.

GitHub Action runner images (AMI) for AWS EC2

As part of the RunsOn service, we automatically maintain and publish replicas of the official GitHub runner images as AWS-formatted images (AMIs).

New images are automatically released within 48h of the official upstream release, and are slightly trimmed to remove outdated software, or (mostly useless) caches.

This means the disk usage is below 30GB, making it sufficiently small to boot in around 40s. The runner binary is also preinstalled in the images.

Supported regions:

  • us-east-1
  • eu-west-1

AMIs can be found using the following search:

  • name: runs-on-ubuntu22-full-x64-*
  • owner 135269210855

You can find more details on https://github.com/runs-on/runner-images-for-aws.

Use GitHub App manifests to symplify your GitHub App registration flow

Clicking through the GitHub UI interface to register a GitHub App with the correct permissions and settings is tedious, and error-prone.

I recently discovered that you can automate all this with GitHub App manifests, which contain all the required details about your app. You can then submit it to GitHub and dynamically exchange the manifest with the App ID, Webhook Secret, and Private key.

On the client front, you simply need a way to POST the manifest to GitHub, using a state parameter that you generate:

<form action="https://github.com/settings/apps/new?state=abc123" method="post">
Register a GitHub App Manifest: <input type="text" name="manifest" id="manifest"><br>
<input type="submit" value="Submit">
</form>
<script>
input = document.getElementById("manifest")
input.value = JSON.stringify({
"name": "Octoapp",
"url": "https://www.example.com",
"hook_attributes": {
"url": "https://example.com/github/events",
},
"redirect_url": "https://example.com/redirect",
"callback_urls": [
"https://example.com/callback"
],
"public": true,
"default_permissions": {
"issues": "write",
"checks": "write"
},
"default_events": [
"issues",
"issue_comment",
"check_suite",
"check_run"
]
})
</script>

Upon submission, GitHub will redirect back to your site with a temporary code, that you can then exchange for the app secrets.

For RunsOn GitHub Action runners, I am using this feature so that you only need 2 clicks after installation to get your private RunsOn app registered into your own GitHub organization with all the correct settings.

With Probot, you can describe your GitHub App manifest in YAML, so this would look like:

# The list of events the GitHub App subscribes to.
default_events:
- workflow_job
- meta
default_permissions:
# required for registering runners
administration: write
# required to access config file
single_file: write
# required to access collaborators and repository metadata
metadata: read
# Organization members and teams.
# https://developer.github.com/v3/apps/permissions/#permission-on-members
members: read
# required to access workflow runs
actions: read
single_file_paths:
- ./.github/runs-on.yml
name: runs-on
# Set to true when your GitHub App is available to the public or false when it is only accessible to the owner of the app.
public: false

For more details, you can have a look at the Probot doc about this.

How to setup docker with NVIDIA GPU support on Ubuntu 22

This took a bit of a search for me, so here it is in case it’s useful:

Terminal window
cd /tmp/
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-drivers-545 nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Then you should be able to run the following docker container with gpu enabled:

Terminal window
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

And get the following output:

Fri Dec 8 14:49:32 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 29C P0 25W / 70W | 5MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+