Skip to content

runners

2 posts with the tag “runners”

GitHub Actions are slow and expensive, what are the alternatives?

Note: this is a recording of a similar talk given for a DevOps meetup on May 16, Rennes, France. You’ll find a generated transcript summary below, but you probably want to watch the video instead.

Introduction

Hello everyone and thanks for coming to this presentation on GitHub Actions and how to make it faster and 10x cheaper. But first a brief primer on GitHub Actions and especially the good parts.

GitHub Actions: The Good Parts

GitHub Actions is a way to run workflows automatically whenever you push code or open a pull request or do anything on your repository.

It has very high adoption, a flexible workflow syntax, and a large choice of architectures so you can run workflows targeted at Linux x64, macOS, Windows, so it’s quite versatile and really useful.

GitHub Actions: The Bad Parts

Here are some major issues with GitHub Actions:

  • Performance and Cost: The default runners on GitHub are pretty weak, sporting just two cores that are both slow and expensive, costing over $300 a month if used non-stop. On the other hand, alternatives like Buildjet, Warpbuild, and UbiCloud offer quicker and cheaper services.

  • Caching and Compatibility Issues: GitHub’s caching tops out at 100MB/s, which can bog down workflows involving large files. Also, there’s no full support for ARM64 runners yet —- they’re still in beta —- slowing down builds that need multiple architectures.

  • Resource Optimization and Time Waste: GitHub’s weaker machines mean you often have to spend a lot of time fine-tuning your test suites to get decent run times. This eats up a lot of engineering hours that could be saved by switching to more robust runners from other providers or by setting up your own.

Solution: Self-Hosted Runners

Self-hosted runners offer a practical solution for those looking to speed up their builds and reduce costs. By setting up your own machines and configuring them with GitHub’s runner agent, you can achieve faster build times at a lower price.

When using non-official runners, you can choose among 3 levels:

  • artisanal on-premise
  • productized on-premise
  • third-party providers

Artisanal on-premise

This approach, which I’ll call ‘artisanal on-premise’, involves using a few of your own servers and register them with GitHub. It’s cost-effective and manageable for a small number of machines but has limitations such as limited concurrency, maintenance requirements, security risks, and lack of environment consistency with GitHub’s official runners.

Productized on-premise

For a more robust setup, consider the ‘productized on-premise’ approach. This involves similar self-hosting principles but requires additional software like the Action Runner Controller or the Philips Terraform project to help manage the runners. This setup offers better hardware flexibility and scalability, as it can dynamically adjust the number of virtual machines based on demand. However, it requires more expertise to maintain and still lacks full image compatibility with GitHub’s official runners, necessitating custom Docker images or AMIs.

Third-Party Providers

The final option is to use third-party providers for more affordable machines. These providers handle maintenance, so you just pay for the service. Most support official images, and they typically offer a 50% cost reduction. However, using these services means you’ll need to share your repository content and secrets, which could be exposed if there’s a security breach. The hardware options are limited; you can choose the number of CPUs but not specific details like the processor type, disk space, or GPU. Additionally, if you need more than 64 CPUs concurrently, extra fees may apply. Often, these services are hosted in locations with suboptimal network speeds.

Market Overview

Here’s a quick overview of the market options for GitHub Actions alternatives:

  • Third-Party SaaS: There’s a wide variety of third-party services available, with new options emerging almost monthly.
  • Fully On-Premise: Options include the Action Runner Controller and the Terraform provider. AWS CodeBuild is a newer addition that allows you to run managed runners within your AWS infrastructure.
  • Hybrid Providers: These offer a mix of on-premise and SaaS solutions. You provide the hardware hosted in your infrastructure, but management is handled through their control plane.

While searching for a cost-effective and efficient self-hosted solution, I found the fully on-premise options challenging to set up, slow to start, and with lengthy queuing times. Additionally, AWS CodeBuild, despite its advantages, is costly and comes with its own set of limitations.

Introducing RunsOn

I’ve been developing RunsOn, a new software aimed at creating a more affordable and efficient on-premise GitHub Actions Runner. Here’s a quick rundown:

  • Accessibility: RunsOn is free for individual and non-commercial use, with paid licenses available for commercial projects.
  • Core Features Desired:
    • Speed and Cost Efficiency: I aimed for faster and cheaper runners.
    • Scalability: Ability to handle numerous jobs concurrently without limitations.
    • Compatibility: Seamless integration with existing GitHub workflows by ensuring full image compatibility with official runners.
    • Low Maintenance: Minimal engineering oversight required, automating most of the operational aspects.
  • Additional Nice-to-Have Features:
    • Flexible instance type selection (CPU, RAM, disk, GPU).
    • Support for both x64 and arm64 architectures, and potentially macOS.
    • Enhanced handling of caches and Docker layers.
    • Easy installation and upgrades.

Overall, the goal is to make RunsOn a robust, user-friendly solution that enhances the efficiency of running automated workflows.

Core Features

  • Speed: To enhance speed, select providers with superior CPUs like Hetzner, AWS, or OVH. RunsOn uses AWS for its diverse instance choices and spot pricing, scoring 3,000 in the CPU PassMark benchmark.
  • Cost Efficiency: For cost savings, consider using Hetzner for artisanal setups or AWS EC2 spot instances for productized solutions. Spot instances can be up to 75% cheaper than on-demand prices, fitting well with the short-lived nature of most CI jobs. Utilize the Create Fleet API from EC2 to minimize spot interruptions by selecting instances from the least interrupted pools.

Scalability and Compatibility

Key points on scalability for RunsOn:

  • Simple and Dynamic: RunsOn launches an ephemeral EC2 instance for each job, which is terminated after the job completes. This approach keeps the system simple and responsive.
  • Concurrency Limits: The only limits to how many jobs you can run concurrently are your AWS quotas.
  • Optimized Queuing Times: By optimizing base AMIs and using provisioned network throughput for EBS, RunsOn achieves queuing times of around 30 seconds. This is competitive with GitHub’s 12 seconds and better than many third-party providers.
  • Stable Performance Under Load: Extensive testing with clients, such as Alan, shows that even with bursts of 100 or more jobs, the queuing times remain stable.

Compatibility with Official Runners

So basically, I wanted to do just this: change one line, and my workflow should still work. This is probably one of the hardest parts because you have to make compatible OS images, in my case for EC2, and nobody did this, or nobody published it at least.

So in my case, thankfully, GitHub publishes the Packer templates for the Runner images on Azure, so I just ported them for AWS, and this is now available for anyone to use. You can find the links here.

Low Maintenance

The final feature, low maintenance, and so as you can see, the architecture diagram has changed a bit since the last slide, but basically what I use for RunsOn is just managed services everywhere, and cheap services. So I have basically one CloudFormation stack which provisions an SQS queue, an SNS alert topic, a CloudWatch logs and metrics, and some S3 buckets, and then the RunsOn server is running on the AppRunner AWS service, which is really a cheap way to run containers on AWS. I recommend you check that out, and yeah, on the VM there is a small RunsOn agent that launches to configure the VM and then register with GitHub, and all that stack, like if you have a reasonable number of jobs, it costs only about one or two dollars a month, which is pretty impressive.

Nice-to-Have Features

Here’s a quick overview of the additional features and real-world results of RunsOn:

  • Flexible Instance Selection: Users can customize their VM specifications such as CPU, RAM, and disk space directly in their workflows.
  • Architecture Support: RunsOn supports both ARM64 and AMD64 architectures. macOS support is currently unavailable due to licensing constraints.
  • Easy Installation and Upgrades: RunsOn offers a one-click installation and upgrade process using a template URL.
  • Enhanced Caching: By leveraging AWS’s S3, RunsOn provides up to five times faster caching and supports caching Docker layers.
  • Custom Images and Full SSH Access: Users can preload software on custom AMIs and have full SSH access to their Runners, with options for private networking and static EIPs.
  • Real-World Impact: RunsOn has significantly reduced costs and increased speed for clients, handling up to 500,000 jobs across various users, from small to large-scale operations.

Future Work

Future enhancements for RunsOn include:

  • Cost Transparency: We plan to make CI costs more visible to developers to highlight the financial and environmental impacts of running multiple jobs.
  • Efficiency Monitoring: Introducing reports to help determine if your Runners are sized appropriately, ensuring you’re not overpaying for unused resources.

Conclusion