Skip to content

Blog

Changelog v2.1.0 - new Server and Agent, shared SQS queue, and more

RunsOn v2.1.0 has just been released 🎉.

Main changes

NodeJS => Go

I switched the server to the Go language, for better concurrency control. NodeJS allowed me to put something out quickly, and test the waters. But now that more and more people are using it, large clients (> 10k jobs a day) were hitting into some hard-to-troubleshoot concurrency issues due to the way NodeJS works. Go has a much better concurrency model, and I think it’s a better fit for the project anyway.

Before : Screenshot 2024-04-04 at 13 13 56

After : Screenshot 2024-04-04 at 13 13 36

If you are coming from a previous v2 version, the upgrade can be done in-place.

Agent and Server no longer public with the base license

Agent and Server source codes are now in separate private repositories, and added as submodules of runs-on/runs-on. Only the CloudFormation template and base AMIs are public.

A Sponsorship license will give you access to everything, so that you or your security team can review all the code, and choose to build from source if needed. Other licenses only get the compiled agent and server binaries.

The reason for this change is two-fold:

  • make it more difficult for the competition to see how the sausage is made, especially now that RunsOn beats the majority of the competition in terms of concurrency, speed, hardware availability, and pricing.

  • nudge larger clients into buying the more expensive license: until now there was no real incentive to buy a more expensive license. I could put some more advanced features into the more expensive tier, but my current view is to provide the best self-hosted runner solution out there, irrespective of the company size. I also didn’t want to use volume-based pricing, since I like to keep billing simple and predictable for users.

Hopefully this will strike a good balance between keeping RunsOn affordable to everyone, and still being sustainable. Please let me know if you have any feedback about this, nothing is written is stone yet.

Features

  • use an SQS FIFO queue to handle pending job workflows. If your AppRunner service needs to scale up horizontaly, this queue will now be shared across all instances, instead of each having its own in-memory queue. This also helps to not lose jobs in case an AppRunner instance goes down. Nice thing is that it also comes with integrated CloudWatch monitoring, so that you can see the number of pending jobs and maximum delay.

  • allow to disable cost reports: a new parameter CostReportsEnabled is in the CloudFormation stack, to disable the generation and sending of cost reports, if you prefer to look at them in CostExplorer or other means anyway.

  • allow to specify the disk size for default and large runner templates: 2 new CloudFormation parameters are now present, to specify the disk size of the default and large runner templates. In your job definition, simply indicate an hdd size and RunsOn will use the default template is hdd <= default size, or the large template if hdd > default size.

image

Fixes

  • remove the AppWorkflowQueueSize parameter from the CF stack. It’s no longer needed, as we align on the EC2 rate-limit for now.

  • bring back default runner and image: you can specify runs-on: runs-on, and it will work again. Same if you don’t specify an image, it will use the ubuntu22-full-x64 by default.

Breaking changes

  • older runner definitions (i.e. runner=2cpu-linux) are no longer supported. You must now use either runner=2cpu-linux-x64 or runner=2cpu-linux-arm64.

Deprecations

  • base and docker variant of the images as they stand are no longer useful, as the boot time of the full images is now considerably faster. They will most likely be removed in a next version, or will be rebuilt as a much lighter version of the full images.

Misc

  • setup flow design has changed a bit.
image

Changelog v2.0.13 - multi-az, multi-region, and much more

RunsOn v2.0.13 has just been released 🎉.

Warning: this is a major release bump, with a new VPC being created. You are advised to upgrade either during a quiet time (no runner running, otherwise the old VPC cannot be destroyed), or simply create a new stack with that template, follow the configuration process, and then Pause the previous AppRunner service until you validate that everything is going fine. Doing it this way will allow you to easily roll back to the previous version by just removing the new stack and clicking Resume on the previous AppRunner service.

Main changes

  • Replaces RunInstances call with CreateFleet, to reduce the number of API calls and increase the chances of finding a spot instance.
  • Multi-az support (3 AZ by default for the stack). stack no longer asks for an AZ choice.
  • capacity-optimized-prioritized allocation, so that it selects the instance type from the pool with the least risk of being interrupted
  • Modify launch sequence so that instance retrieves boot details from the S3 bucket (no more user-data)
  • Make RunsOn region aware (with region label), allowing deployments of RunsOn in multiple regions

General improvements

  • Default runner types are now separated into -x64 and -arm64 variants (simplifies configuration, no need to explicitly specify image), e.g. runs-on: runs-on,runner=2cpu-linux-arm64
  • Implement new rate limiters for EC2 RunInstances and TerminateInstances operations, as well as for workflow queuing. All are configurable.
  • New ubuntu22 full images, with some more cleanup of legacy software to reduce image sizes, and use of an agent to launch the runner earlier, instead of waiting for the execution of the cloud-final service. Current timings (from workflow job created to workflow job running) with full image: x64=39s, arm64=34s
  • Add timings for when the workflow job was created on GitHub, when the workflow job webhook got received, when the workflow started to be scheduled, and when the instance was seen as pending by AWS
image

Fixes

  • Fix default alarm. Make threshold configurable.
  • Stack no longer requires extended IAM permissions.

Misc

  • Truncate CloudWatch dimension values to 250 chars.
  • Change runner name format (runs-on--<INSTANCE_ID>--<RANDOM>), so that it contains the instance id.
  • No more success email when service is up, since you could receive those whenever the service is scaled up by AppRunner.
  • No more cost email when service is up. Wait 24h before the first one.

Breaking changes

  • Stack requires a VPC and subnet change, so perform the upgrade in a quiet time.
  • Runners no longer defaults to the 2cpu-linux x64 runner. You always need to specify a runner label as a base.
  • Specifying an image or runner label that does not exist will now raise an error, instead of silently falling back to the default image or runner specification.

Changelog v1.7.3 - now in eu-central-1 and us-west-2

RunsOn v1.7.3 has just been released 🎉.

What’s Changed

  • Official support for Frankfurt (eu-central-1) and Oregon (us-west-2) regions.
  • Disable AWS SDK retries for RunInstances API calls, to avoid rate limit issues.
  • Add m7i as an additional family type for default runners. Since m7a/c7a instances are in short supply, this should help make the onboarding for new users easier.

Changelog v1.7.2 - Streamline install procedure

RunsOn v1.7.2 has just been released 🎉.

What’s Changed

  • instant reload after first setup
  • fix templates
  • fix request limit exceeded errors for RunInstances API and DescribeInstanceTypes API
  • check license key
  • remove nodemon from prod
  • no longer log health check requests
  • unify logging formats, tag all lines with workflow job details for easy troubleshoot
  • allow runner config to define image, spot, ssh settings
  • specify IMDSv2 (closes #24).
  • add run-id, job-name and job-id to instance tags
  • publish consumed minutes across many dimensions:
    • Repository
    • WorkflowName
    • WorkflowJobConclusion
    • WorkflowJobName
    • InstanceType
    • InstanceLifecycle
    • ImageId
    • RunnerId

Changelog v1.6.2 - Restore launch queue size to sane limit

RunsOn v1.6.2 has just been released 🎉.

This is mostly a maintenance release, but important for the users who are launching a lot of runners in a short time.

By default EC2 has pretty aggressive rate limits set on the RunInstances API (2/s, with some burst allowed), and if you go over that limit, your runner will fail to start and RunsOn will  send you an email alert telling you about it (RequestLimitExceeded).

Until now the queue size was set to 8/s, but since most users are using new accounts to install RunsOn, it can cause issues with the low default of max 2/s.

So from now on RunsOn will default to 2/s as well, and if your account has increased quota for the RunInstances API, you can then specify a higher number by using the new CloudFormation template AppEc2QueueSize:

EC2 queue size setting

Have a great day!