Changelog
Guides
Section titled “Guides”Versions
Section titled “Versions”v2.8.5
Section titled “v2.8.5”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.5.yaml
Summary
Fix buildkit gha exporter, better user error reporting, AppRunner VPC connector integration.
What's changed
Report non-retryable user errors directly in GitHub: whenever a job can't be started for user reasons (e.g. bad image, bad runner definition, etc.), RunsOn will now spawn a default runner that will fail at the "Set up runner" step, with an error message explaining why. This will help surface issues. Fixes #307.
Fix issue with
type=gha
buildkit exporter for docker layers. Fixes #328.Automatically enable the AppRunner VPC connector when
Private
mode is active, so that all AppRunner egress traffic (for the RunsOn orchestrator) goes through the private subnet(s) NAT gateways or equivalent. This means the AppRunner service will use the same static IP(s) as the runners, so that you can whitelist the AppRunner service on your GHES or GitHub Enterprise installation if needed. All ingress traffic is still publicly allowed and handled by AWS.Telemetry: send values for
networking_stack
(embedded or external), andextras
. Will help better understand how RunsOn is setup and which extra features are most used.
v2.8.4
Section titled “v2.8.4”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.4.yaml
Summary
Integrated CPU/Memory/Disk/Network monitoring, integrated job-level cost reporting, official snapshot action release, and many QoL improvements.
Spotlight: Monitoring improvements
- Allow to send metrics to CWAgent namespace. This allows runs-on/action@v2 to send and graph metrics right within your job output. For instance:
📊 Disk Writes:
5973 ┤ ╭╮
5500 ┤ ╭╯╰╮
5028 ┼╮ ╭╯ ╰╮ ╭───╮ ╭
4555 ┤╰╮ ╭╯ ╰╮ ╭──╯ ╰─╮ ╭─╯
4083 ┤ ╰╮ ╭╯ ╰─╮ ╭──╯ ╰╮ ╭─╯
3610 ┤ ╰─╮ ╭╯ ╰╮ ╭──╯ ╰─╮ ╭─╯
3138 ┤ ╰╮ ╭╯ ╰╮ ╭───╯ ╰───╯
2665 ┤ ╰╮ ╭╯ ╰─╯
2193 ┤ ╰──╯
Disk Writes (Ops/s)
Stats: min:2040.0 avg:4180.9 max:6026.0 Ops/s
Create resource group for EC2 instances on CloudWatch. This means you can go to the CloudWatch EC2 Automatic dashboard, select your resource group (named after your RunsOn stack) and get a high-level overview of metrics for all your runner instances.
Allow instance role to enable detailed monitoring on demand (not used for now, but might be an option of runs-on/action.
Spotlight: Costs computation
- The runs-on/action@v2 now automatically computes the costs associated for each job, and displays the results right within your job logs. You can also choose to display them as a job summary.
Spotlight: Block-level snapshots
- The runs-on/snapshot@v1 action is available and can be used to save and restore entire folders between job executions, at a much faster speed (for long jobs) than other methods relying on compression and export to S3 or other.
Stack improvements
- Allow to override the max runner time limit. Fixes #320.
- Add support for permission boundary for SchedulerInvokeRole. Fixes #315.
- Add AWS account-id and region to emails. Fixes #298.
- Remove explicit
PublicAccessBlockConfiguration
declaration since some SCP policies can incorrectly flag thes3:PutBucketPublicAccessBlock
action. This is the default for new buckets anyway. - Add ECR Full Access managed policy to obtain higher ECR Public Rate Limits.
- Make bootstrapping work on more distros.
- Allow RunsOn to auto-create spot role if absent.
- Add health checks for email notification subscription, and ec2 spot role.
Misc
- Remove legacy (and unused)
.env
loading. - Update go to 1.24.
v2.8.3
Section titled “v2.8.3”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.3.yaml
Summary
Huge improvements to tagging, magic cache is now even faster, and bug fix for jobs tied to environments with no approval required.
What's changed
QoL improvements
Check if the spot role exists before starting RunsOn service (preflight check 2 from the installation guide). If not, alert the user over the SNS topic.
Cleanup all dangling instances, irrespective of RunsOn version.
Rewrote magic cache to more efficiently stream uploads when
actions/cache
is the client. For bigger (>1GiB) payloads, there should be a very noticeable improvement.If
SSHAllowed
is set tofalse
at the stack level, discard anyssh=true
value coming from label or repo config. Fixes #310.
Improvements to tagging
Pass all custom tags to volumes, in addition to instances, when creating the runner. Fixes #264.
Allow to set additional custom tags using a custom property in the GitHub settings of a repository. If a custom property with name
runs-on-custom-tags
exists, RunsOn will parse it in the same way as the stack-level custom tags, and apply them to the instance and volumes. Fixes #297.For instance: if the value for property
runs-on-custom-tags
is set tokey1=val1,key2=val2
then instances and volumes will get 2 new tags (key1, key2) with their corresponding values.Same restrictions than stack-level tags apply. Stack-level tags take precedence over tags set in the custom property, and tags set in custom properties takes precedence over custom runner tags defined in the
.github/runs-on.yml
configuration.Pass custom tags and default branch to runner config. And write config in
/runs-on/config.json
(linux), orC:\runs-on\config.json
(Windows). Config can then be read by actions / scripts etc. to access all runner details easily.
Bug fixes
- Fix the race-condition that could lead to 2 instances being started when handling jobs tied to a deployment that does not require approval.
Misc
- Add goroutine to cleanup dangling volumes and snapshots (prepare for block-level snapshots).
- Register
waiting
,in_progress
, andcompleted
webhook payloads in S3 (in addition toqueued
).
v2.8.2
Section titled “v2.8.2”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.2.yaml
Summary
Support for EFS, TMPFS, and ECR ephemeral registry for fast docker builds. Also some bug fixes.
What's changed
EFS
- Embedded networking stack can now create an Elastic File System (EFS), and runners will auto-mount it at
/mnt/efs
if theextras
label includeefs
. Useful to share artefacts across job runs, with classic filesystem primitives.
jobs:
with-efs:
runs-on: runs-on=${{ github.run_id }},runner=2cpu-linux-x64,extras=efs
steps:
- run: df -ah /mnt/efs
# 127.0.0.1:/ 8.0E 35G 8.0E 1% /mnt/efs
📝 Example use case for maintaining mirrors
For instance this can be used to maintain local mirrors of very large github repositories and avoid long checkout times for every job:env:
MIRRORS: "https://github.com/PostHog/posthog.git"
# can be ${{ github.ref }} if same repo as the workflow
REF: main
jobs:
with-efs:
runs-on: runs-on=${{ github.run_id }},runner=2cpu-linux-x64,extras=efs
steps:
- name: Setup / Refresh mirrors
run: |
for MIRROR in ${{ env.MIRRORS }}; do
full_repo_name=$(echo $MIRROR | cut -d/ -f4-)
MIRROR_DIR=/mnt/efs/mirrors/$full_repo_name
mkdir -p "$(dirname $MIRROR_DIR)"
test -d "${MIRROR_DIR}" || git clone --mirror ${MIRROR/https:\/\//https:\/\/x-access-token:${{ secrets.GITHUB_TOKEN }}@} "${MIRROR_DIR}"
( cd "$MIRROR_DIR" && \
git remote set-url origin ${MIRROR/https:\/\//https:\/\/x-access-token:${{ secrets.GITHUB_TOKEN }}@} && \
git fetch origin ${{ env.REF }} )
done
- name: Checkout from mirror
run: |
git clone file:///mnt/efs/mirrors/PostHog/posthog.git --branch ${{ env.REF }} --single-branch --depth 1 upstream
Ephemeral registry
- Support for an Ephemeral ECR registry: can now automatically create an ECR repository that can act as an ephemeral registry for pulling/pushing images and cache layers from your runners. Especially useful with the
type=registry
buildkit cache instruction. If theextras
label includesecr-cache
, the runners will automatically setup docker credentials for that registry at the start of the job.
jobs:
ecr-cache:
runs-on: runs-on=${{ github.run_id }},runner=2cpu-linux-x64,extras=ecr-cache
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v4
env:
TAG: ${{ env.RUNS_ON_ECR_CACHE }}:my-app-latest
with:
context: .
push: true
tags: ${{ env.TAG }}
cache-from: type=registry,ref=${{ env.TAG }}
cache-to: type=registry,ref=${{ env.TAG }} }},mode=max,compression=zstd,compression-level=22
Tmpfs
Support for setting up a tmpfs
volume (size: 100% of available RAM, so only to be used on high-memory instances), and binding the /tmp
, /home/runner
, and /var/lib/docker
folders on it. /tmp
and /home/runner
are mounted as overlays, preserving their existing content.
Can speed up some IO-intensive workflows. Note that if tmpfs
is active, instances with ephemeral disks won't have those mounted since it would conflict with the tmpfs
volume.
jobs:
with-tmpfs:
runs-on: runs-on=${{ github.run_id }},family=r7,ram=16,extras=tmpfs
steps:
- run: df -ah /mnt/tmpfs
# tmpfs 16G 724K 16G 1% /mnt/tmpfs
- run: df -ah /home/runner
# overlay 16G 724K 16G 1% /home/runner
- run: df -ah /tmp
# overlay 16G 724K 16G 1% /tmp
- run: df -ah /var/lib/docker
# tmpfs 16G 724K 16G 1% /var/lib/docker
You can obviously combine options, i.e. extras=efs+tmpfs+ecr-cache+s3-cache
is a valid label 😄
Instance-storage mounting changes
Until now, when an instance has locally attached NVMe SSDs available, they would be automatically formatted and mounted so that /var/lib/docker
and /home/runner/_work
directories would end up on the local disks. Since a lot of stuff (caches etc.) seem to end up within the /home/runner
folder itself, the agent now uses the same strategy as for the new tmpfs
mounts above (i.e. the whole /home/runner
folder is mounted as an overlay on the local disk volume, as well as the /tmp
folder. /var/lib/docker
remains mounted as a normal filesystem on the local disk volume). Fixes #284.
Misc
- Move all RunsOn-specific config files into
/runs-on
folder on Linux. More coherent with Windows (C:\runs-on
), and avoids polluting/opt
folder. - Fix
app_version
in logs (was previously empty string due to incorrect env variable being used in v2.8.1). - Fix "Require any Amazon EC2 launch template not to auto-assign public IP addresses to network interfaces" from AWS Control Tower. When the
Private
mode is set toonly
, no longer enable public ip auto-assignment in the launch templates. Thanks @temap!
v2.8.1
Section titled “v2.8.1”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.1.yaml
Summary
A large release: can now use external networking stack ; enable encryption on all S3 buckets ; lots of quality of life improvements and bug fixes ; halve Windows boot times and enable Cloudwatch agent monitoring. Be sure to read the upgrade notes.
What's changed
Networking
- Can now reuse existing networking stack. If
NetworkingStack
stack parameter is set toexternal
instead ofembedded
. Fixes #198, fixes #265, fixes #230 (community-provided networking stack can provide this feature).
- Some not-so-useful stack outputs have been removed. Some outputs may be
-
if using an external VPC.
Caching
- Fix invalid cache key restoration for Magic Cache. Thanks @erikburt from ChainlinkLabs for the troubleshooting.
Security
Enable server-side encryption using AWS-managed KMS key on all S3 buckets. Fixes #276.
No longer expose JIT token in cloud-init-output logs. The token is no longer valid after a job is run, but still.
QoL improvements
Add
AppDebug
(true or false) stack parameter, which allows to disable the auto-shutdown of runners when the bootstrap fails. Useful to investigate what is going on when the runner initializes.Add
AppCustomPolicy
stack parameter: Optional managed IAM Policy ARN to assign to the App runner service role. Can be used to e.g. allow access to KMS decryption keys for AMIs. Thanks @dsme94!Add
AppGithubApiStrategy
(normal or conservative) stack parameter to opt into minimizing GitHub API usage. If set toconservative
, runners won't be automatically unregistered in GitHub internal database (GitHub will still clean them up after 24h). This helps for users with very large number (20k+) of jobs launched every day. Fixes #285.Now bootstraps runners using runs-on/bootstrap binary, preinstalled on official RunsOn images (faster and more extensible).
On spot interruption, give more time to the job to possibly complete before shutdown is triggered. Shutdown is now triggered 20s before the expected time sent by AWS, instead of 15 seconds after the notification is received. Fixes #277.
Windows
Shaved about 50s from Windows boot times: SSH is no longer automatically installed on Windows (SSM agent is available now), and no longer using
Invoke-WebRequest
helped a lot (TIL).CloudWatch agent is automatically installed on Windows AMIs, and EC2Launch logs are shipped to CloudWatch (same naming as for Linux runners: e.g.
LOG_GROUP_NAME/INSTANCE_ID/cloud-init-output.log
). Also added support forroc connect
on Windows AMIs in the RunsOn CLI.
Bug fixes
Fix for invalid CreateTags requests - Fixes #288.
Fix for invalid EC2 rate-limiter being used when uploading user-data file to S3. Fixes #286 .
Adjust ownership rule for S3 bucket logging, from
BucketOwnerPreferred
toBucketOwnerEnforced
. Fixes #291.
v2.7.2
Section titled “v2.7.2”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.7.2.yaml
Summary
GHES support is now available. Allow to specify a custom expiration for objects in the cache bucket.
What's changed
- GHES support is now available. Fixes #250.
- Add
S3CacheExpirationInDays
stack parameter. Fixes #179. - Tag launch templates. Fixes #264.
- Pin launch template version to the specific version active at the time RunsOn service wqas deployed. Fixes #274.
- Tag instances with
runs-on-is-ghes
andruns-on-integrations-active
. - Tag instances with
InspectorEc2Exclusion
to avoid SSM inspector scans on running instances. Possibly fixes #242.
v2.7.1
Section titled “v2.7.1”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.7.1.yaml
Summary
Hotfix: fix for disk=large
handling.
v2.7.0
Section titled “v2.7.0”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.7.0.yaml
Summary
A few minor breaking changes related to VPC flow logs and hdd
label. Plus many fixes.
Breaking changes
This is a minor release, so this comes with the following breaking changes. Please review your CloudFormation parameters and runner configuration accordingly when updating:
- Fix for #258. VPC Flow Logs are now only enabled if the
VpcFlowLogFormat
is set to a non-empty value. To enable, and use the default format (as it were before if you didn't specify a value), specifydefault
. - Remove support for the deprecated
hdd
job label. Ensure all your workflows and repository configuration (.github/runs-on.yml
) do not use this label. If it is still set, it will have no effect and the default runner configuration will be used for disk sizing. You must now use thedisk=default
ordisk=large
label instead.
What's changed
- Fix for S3 server access logging. Fixes #241.
- Allows specifying the root volume name. Fixes #207.
- Expose
RUNS_ON_AWS_AZ
andRUNS_ON_INSTANCE_LAUNCHED_AT
environment variables to jobs. - Properly set
RUNNER_TOOL_CACHE
🤦, so that somesetup-*
actions can properly use the hosted toolcache on the VM. - Add policy to allow instances to describe their tags. Means we no longer need to enable
InstanceMetadataTags
for the instances. Cost Allocation Tag and Runner Tags can now contain slashes in their keys. - Add
runs_on_spot_circuit_breaker_active
prometheus metric (1=active, 0=inactive). Fixes #271. - Ensure we don't try to auto-retry after spot termination if the workflow run has already been manually re-attempted. Fixes #263.
- Fixes typo - Fixes #248.
- Scope the minutes alarm on the stack name. Also add the
StackName
dimension on all metrics. Fixes #235. - (alpha, not fully functional yet) Support for GitHub Enterprise Server (GHES) installations.
v2.6.8
Section titled “v2.6.8”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.8.yaml
Summary
Hotfix for CreateFleet IdempotentParameterMismatch errors, as well as Magic Cache support for newer buildx versions.
What's changed
- Fixes #251:
IdempotentParameterMismatch
error. - Fix Magic Cache for newer buildx versions. No longer need to set
version=1
incache-from
andcache-to
. - Fixes #249: Add cancelled to the list of conclusion statuses that can trigger an auto-retry.
v2.6.7
Section titled “v2.6.7”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.7.yaml
Summary
New spot circuit breaker for snoozing spot requests if too many interruptions detected. Monitoring improvements. StepSecurity integration, and more.
What's changed
Spot circuit breaker
- Allow to switch to on-demand requests if spot interruption frequency is too high over a defined time interval. Fixes #226.
For instance, if SpotCircuitBreaker
is set to 2/30/60
, it means that after at least 2 interruptions in the last 30 minutes, RunsOn will switch to on-demand requests for the next 60 minutes.
Monitoring
- Add workflow job conclusion to prometheus labels. Fixes #178. Also add
job_conclusion
andrun_attempt
to all log lines. - Support SQS queue oldest message age alarms. Helps with compliance and to detect whether RunsOn has issues dequeuing messages fast enough. Fixes #228.
- Use scheduled event to compute and send cost reports at midnight UTC. Fixes #216.
Native integration with StepSecurity
- Support StepSecurity integration with new images:
jobs:
job-with-stepsecurity:
runs-on: "runs-on=${{ github.run_id }}/runner=2cpu-linux-x64/image=ubuntu24-stepsecurity-x64"
steps:
- name: External call
run: curl https://google.com
Documentation: https://runs-on.com/integrations/stepsecurity/
Misc
- Reduce agent binary size.
- Update Go dependencies.
- Allow injection of custom runner agent (internal testing only).
- Remove magic cache ON annotation. Fixes #234.
v2.6.6
Section titled “v2.6.6”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.6.yaml
Summary
Fix VpcEndpoints
stack parameter.
What's changed
With VpcEndpoints
enabled, the CloudFormation template was incorrectly assigning interface endpoints to both public and private subnets, while an interface endpoint can only be defined once per AZ (and only makes sense for private subnets anyway).
Thanks again to Commonwealth Fusion Systems for their quick feedback and help!
v2.6.5
Section titled “v2.6.5”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.5.yaml
Summary
Optimized GPU images, new VpcEndpoints
stack parameter, ability to specify custom instance tags for custom runners.
Note: there appears to be some issues with the new VPC endpoints. I'm on it! If you need that feature, please hold on to your current version of RunsOn.
What's Changed
- New GPU images
ubuntu22-gpu-x64
andubuntu24-gpu-x64
: 1-1 compatibility with GitHub base images + NVidia GPU drivers, CUDA toolkit, and container toolkit. - Add new
VpcEndpoints
stack parameter (fixes #213), and reorganize template params. Note that the EC2 VPC endpoint was previously automatically created whenPrivate
mode was enabled. This is no longer the case, so make sure you select the VPC endpoints that you need when you update your CloudFormation stack. - Suspend versioning for cache bucket (fixes #191).
- Allow to specify instance tags for runners (fixes #205). Tag keys can't start with
runs-on-
prefix, and key and values will be sanitized according to AWS rules.
v2.6.4
Section titled “v2.6.4”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.4.yaml
Summary
CLI 0.0.1 released, fix for Magic Cache, fleet objects deletion.
What's changed
- CLI released: https://github.com/runs-on/cli. Allows to easily view logs (both server logs and cloud-init logs) for a workflow job by just pasting its GitHub URL or ID. Also allows easy connection to a runner through SSM.
- Fix race-condition in Magic Cache (fixes #209).
- Delete the fleet instead of just the instance (fixes #217).
v2.6.3
Section titled “v2.6.3”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.3.yaml
Summary
Fix magic cache handling of actions/upload-artifact. Prepare for RunsOn CLI.
What's changed
- Store instance id assigned to job (once job has started) in the main S3 bucket (under
/runs-on/db/jobs/JOB_ID/instance-id
), as well as the payload for the workflow_jobqueued
event. Will be used for #201. - Fix magic cache for cache keys with slashes inside.
- Make magic cache play nice with
actions/upload-artifact
. For that you must addruns-on/action@v1
in your workflows. Fixes #197. - Documentation for magic cache at https://runs-on.com/caching/magic-cache/
v2.6.2
Section titled “v2.6.2”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.2.yaml
Summary
Magic transparent cache for dependencies and docker layers. SSM support for logging into runner instances. And more.
What's changed
- BETA - Transparent S3-backed caching for ALL actions that depend on the official cache toolkit from GitHub. Fixes https://github.com/runs-on/cache/issues/23 and actually makes the runs-on/cache action redundant in the context of RunsOn runners. For now, the magic caching is only enabled with the
extras=s3-cache
job label.
jobs:
look-ma-no-cache-config:
runs-on: "runs-on=${{github.run_id}}/runner=2cpu-linux-x64/extras=s3-cache"
steps:
# standard action is supported, no need to use `runs-on/cache@v4`
- uses: actions/cache@v4
with:
path: my-path
key: my-key
# third-party actions that depend on official toolkit (99%) are supported as well
- uses: ruby/setup-ruby@v1
with:
bundler-cache: true
- BETA - Transparent S3-backed caching for Docker layers when using
cache-to: type=gha
/cache-from: type=gha
. For now, the magic caching is only enabled with theextras=s3-cache
job label.
jobs:
look-ma-no-cache-config:
runs-on: "runs-on=${{github.run_id}}/runner=2cpu-linux-x64/extras=s3-cache"
# BEFORE
- name: "Build and push image (explicit s3 config)"
uses: docker/build-push-action@v4
with:
tags: test
cache-from: type=s3,blobs_prefix=cache/docker-s3/,manifests_prefix=cache/docker-s3/,region=${{ env.RUNS_ON_AWS_REGION }},bucket=${{ env.RUNS_ON_S3_BUCKET_CACHE }}
cache-to: type=s3,blobs_prefix=cache/docker-s3/,manifests_prefix=cache/docker-s3/,region=${{ env.RUNS_ON_AWS_REGION }},bucket=${{ env.RUNS_ON_S3_BUCKET_CACHE }},mode=max
# AFTER
- name: "Build and push image (type=gha, automgically switched to S3)"
uses: docker/build-push-action@v4
with:
tags: test
cache-from: type=gha
cache-to: type=gha,mode=max
- Assign
AmazonSSMManagedInstanceCore
policy to EC2 instances, so that one can easily connect to the runner instance with SSM. Fixes #129.
AWS_PROFILE=YOUR_PROFILE aws ssm start-session --target INSTANCE_ID --reason "testing ssm"
- Allow to inject additional environment variables from the preinstall step, by exposing a
$GITHUB_ENV
variable that you can write to. The variables will automatically be made available to the job steps. Fixes #188.
runners:
preinstall-with-env:
image: ubuntu22-full-arm64
family: ["c7g"]
preinstall: |
echo "Adding a custom env var..."
echo "MY_CUSTOM_VAR=my_custom_value" >> $GITHUB_ENV
Support
preinstall
for Windows runners.Expose
RunsOnServiceArn
as output, so that one can use it to build the CloudWatch log paths. Fixes #184.Do not send the cost allocation tag warning if the latest cost report was non-zero. Fixes #187.
Add
Ec2LogRetentionInDays
stack parameter. Fixes #189.Allow to read license key from SSM. Fixes #176.
v2.6.1
Section titled “v2.6.1”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.1.yaml
Summary
New stack parameters and best practices compliance changes. No longer defaults to fetching global config when a local repo config is not found. Improve housekeeping to handle an additional AWS internal error case when launching an instance.
What's changed
- Add parameter to enable/disable IPv6:
Ipv6Enabled
. Default is nowfalse
, which is a change from previous versions where IPv6 was always enabled. The reason for that is that it looks like docker pulls will go through IPv6 IPs, and for some reason they are getting rate-limited much faster than on IPv4. Will have to dig a bit deeper into that. Fixes #177. - Add parameter to disable the inbound SSH rule in the default security group for runners:
SSHAllowed
. Default istrue
. Fixes #174. Fixes #159. - Add
VpcFlowLogRetentionInDays
stack parameter. Fixes #180. - No longer defaults to fetching global config when a local repo config is not found. The current behaviour was a bit broken with the caching mechanism, and lead to confusion. Let's make the behaviour explicit by requiring a local repo config file, with an explicit
_extends
directive. I understand this is a bit cumbersome if you have many repositories, but I think it's also nice to be able to inspect which repositories are inheriting from the global config. I'm introducing this change as part of a patch release because the current behaviour was already broken on v2.6.0. - Housekeeping: Detect AWS server issue that sometimes leaves instances in pending state, in which case RunsOn will terminate the current instance, and reschedule.
- Enable versioning on all S3 buckets. Fixes #181.
v2.6.0
Section titled “v2.6.0”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.0.yaml
Summary
Auto-retry mechanism for spot interruptions, SingleAZ or MultiAZ NAT gateways, and more!
What's changed
- Spot workflows are now retried once with an on-demand instance if interrupted. Fixes #160. Requires a permission update (
write
permission for Actions instead ofread
) for existing installations. You should receive an email with instructions after upgrading. Also addruns-on-workflow-job-interrupted=true
to the instance tags if the spot instance was interrupted. - Add new label
retry
, with possible valuesretry=when-interrupted
(default for spot), andretry=false
to opt out of any auto-retry (useful for non-idempotent jobs). - Add
runs-on-workflow-job-id
to the instance tags once the job has started. Also add it to prometheus metric labels. - Rename tag
runs-on-job-started
=>runs-on-workflow-job-started
- Allow to use 1 NAT gateway per AZ instead of a single one for all. Fixes #165.
- Add optional
VpcCidrSubnetBits
,DefaultPermissionBoundaryArn
,VpcFlowLogFormat
, andVpcFlowLogS3BucketAr
parameters, so that users can get more conformant stacks compared to their internal settings. - Set RunsOn env variables on Windows.
New Contributors
- @chris9979 made their first contribution in https://github.com/runs-on/runs-on/pull/167
- @colstrom made their first contribution in https://github.com/runs-on/runs-on/pull/169
v2.5.9
Section titled “v2.5.9”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.9.yaml
Summary
Fix GitHub webhook custom_properties
handling when non-string values.
v2.5.8
Section titled “v2.5.8”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.8.yaml
Summary
Revert x/time dependency to v0.6.0 since v0.7.0 introduced a breaking change for rate-limits when using a zero limit.
v2.5.7
Section titled “v2.5.7”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.7.yaml
Summary
Add Private=only mode, make EBS encryption opt-in, introduce disk
label. Plus fixes and minor improvements.
Note: please use v2.5.8+ because this version embeds a dependency upgrade for the rate-limit library, which introduced a regression.
What's changed
- Update github go library to fix issue with custom properties.
- Make EBS encryption opt-in, and specify default encryption key (fixes #152).
- Add
Private=only
mode for the CloudFormation stack, so that runners are forbidden to launch in a public subnet. Fixes #150. - Disable automatic public IP assignment in public subnets when
Private=only
is set for the stack (helps with conformance). - Remove
HousekeepingEnabled
stack parameter. Housekeeping is now always enabled. - No longer display
EgressStaticIp
in job logs since we don't know which one the runner will end up using.
Deprecations
- Introduce
disk=default
ordisk=large
label to simplify disk size selection based on the runner volumes defined in the RunsOn CloudFormation stack.hdd
is now deprecated and will be removed in a next non-patch version.
v2.5.6
Section titled “v2.5.6”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.6.yaml
Summary
Enable IPv6 for runners. Allow to specify multiple static IPs for the managed NAT gateway. Allow filtering images based on tags. A lot of changes (again) around GitHub rate-limit handling and housekeeping mechanism.
New features
- Enable IPv6 for runners (fixes #142). An IPv6 is attached for both public and private runners, with an egress ipv6 (free) gateway for private instances.
- Allow to specify multiple static IPs for the managed NAT gateway (fixes #139). By default up to 2 are possible, and up to 8 when a quota increase is requested. This helps if you are launching a large number of runners in private subnets, and some external service rate-limits you based on the IP.
- Allow filtering images based on a tag, in addition to the name wildcard (e.g.
is-production-ready=true
). Example :
# .github/runs-on.yml
images:
custom:
owner: "123456789"
name: "my-org/my-image-name-*"
arch: x64
platform: linux
tags:
# filter with specific value
is-production-ready: "true"
# allow any value
other-tag: "*"
- Automatically bind-mount
/var/lib/docker
on the ephemeral instance storage, if any. Fixes #144.
Bug fixes
- Escape shell special characters in env file values.
- If a matching AMI cannot be found, do not retry and alert on first error.
- Do not attempt to retry job if generated fleet params configuration is incorrect.
- Abort early if workflow run status cannot be checked.
Fixes to avoid GitHub rate-limit issues
- No longer attempt to reschedule jobs where a runner theft is suspected. Instead log a warning message telling users to make sure their jobs have unique enough labels. In some cases this was triggering useless reschedules due to GitHub not reflecting the job state quickly enough.
- Fix too many GitHub calls when fetching repo config from an
extends
attribute (cache it). - No longer unregister runners from GitHub if API credit is lower than 2500. They will be removed by GitHub 24h later anyway.
- Reorganize rate-limiters, increase
DELAY_SECONDS_FOR_CHECK_BACK
to 180s instead of 120s. Enable github rate-limiter, and set burst to the current number of remaining tokens. - Only attempt to finalize a job once at most. Instance will auto-terminate anyway so at worst we lose the job usage metrics in CloudWatch. But at least we don't eat into the GitHub / EC2 credits.
- Set housekeeping and termination queue sizes to 1 to reduce their impact on GitHub API credits.
v2.5.5
Section titled “v2.5.5”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.5.yaml
Summary
Strengthen CF template configuration to better conform to AWS guidelines. Bug fixes.
What's changed
- Verify that generated JIT token has at least one char.
- Do not attempt to retry runner creation when we know the original request is invalid (e.g. invalid runner configuration due to mismatched labels etc.)
- Strengthen CF template configuration to better conform to AWS guidelines.
- Make sure empty admin values are ignored.
- If no repository config found, cache the result for 1 minute to avoid hammering GitHub API.
v2.5.4
Section titled “v2.5.4”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.4.yaml
Summary
New ubuntu24 images, new housekeeping task to auto-restart instances that failed to launch, new always-on Private setting, additional runner details in logs, and more.
Notable changes (from v2.5.0 to v2.5.4)
- Add ubuntu24 official images:
ubuntu24-full-x64
andubuntu24-full-arm64
. Private
CloudFormation parameter now acceptsalways
as value, in which case the runners will always launch in the private subnets by default (unless opt-out withprivate=false
).- Display GitHub current rate-limits in logs (search for
tokens
). - Add 'Private' dimension to cloudwatch stats.
- Add
Environment
,IsPrivate
, andStaticIp
(if IsPrivate) to runner details (in Setup job logs) - Increase frequency for spot interruption polling + add logs.
- Conform to AWS spec when sanitizing custom tags (key and value). Fixes #125.
- Add housekeeping task to handle edge cases where a job is still seen as queued by GitHub after a few minutes even after an instance has been launched.
- Allow to disable new housekeeping mechanism.
- Properly tag instance volumes with cost allocation tag. Cost report email will likely go up.
- Display
app_environment
andapp_stack_name
in logs. - Attempt to fix rare preinstall issue ending up with "text file busy".
- Unregister runner from GitHub when job is completed (i.e. do not wait for auto-expiration since it does not seem that reliable).
Experimental
- Bring back support for single string label, using
/
as the separator instead of,
. e.g.runs-on: runs-on/runner=2cpu-linux-x64/other=tag
will work. This simplifies passing a runs-on specification as input to dependent workflows. If you have multiple RunsOn stacks, make sure they are all upgraded to this version before using this new syntax in workflows.
Internal
- Fix issue with
private
attribute not being properly loaded from the repository configuration file. - Switch to semaphores for processing the 3 queues.
- Check workflow run status before scheduling job.
- Add termination queue.
- Update GitHub App (for new installations) to listen for workflow_run events (not used yet, but will be soon).
- Upgrade default runner version when no runner is preinstalled.
v2.5.2
Section titled “v2.5.2”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.2.yaml
Summary
Summary: refactor rate-limits, fix housekeeping behaviour, add missing cost allocation tags, fix rare preinstall bug, unregister runner from github after job termination.
What's changed
- Refactor rate-limits, add proper github rate limiter (defaults to 5000 req/h max). Might introduce a stack parameter if it's too low for some users on GitHub Enterprise plans.
- Reduce concurrency of housekeeping queue, since it's not high priority.
- Attempt to fix rare preinstall issue ending up with "text file busy".
- Upgrade default runner version when no runner is preinstalled.
- Display
app_environment
andapp_stack_name
in logs. - Properly tag instance volumes with cost allocation tag. Cost report email will likely go up.
- Unregister runner from GitHub when job is completed (i.e. do not wait for auto-expiration since it does not seem that reliable).
Experimental
- Bring back support for single string label, using
/
as the separator instead of,
. e.g.runs-on: runs-on/runner=2cpu-linux-x64/other=tag
will work. This simplifies passing a runs-on specification as input to dependent workflows. If you have multiple RunsOn stacks, make sure they are all upgraded to this version before using this new syntax in workflows.
v2.5.1
Section titled “v2.5.1”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.1.yaml
Summary
New ubuntu24 images, additional runner details in logs, scheduling retry mechanism if internal AWS server error when launching, and more.
Note: DO NOT USE this release. The new housekeeping behaviour is not working as expected.
What's Changed
- Add 'Private' dimension to cloudwatch stats
Private
CloudFormation parameter now acceptsalways
as value, in which case the runners will always launch in the private subnets by default (unless opt-out withprivate=false
).- Fix issue with
private
attribute not being properly loaded from config file. - Add
Environment
,IsPrivate
, andStaticIp
(if IsPrivate) to runner details (in Setup job logs) - Add ubuntu24 official images
- Increase frequency for spot interruption polling + add logs.
- Conform to AWS spec when sanitizing custom tags (key and value). Fixes #125.
- Add housekeeping task to handle edge cases where a job is still seen as queued by GitHub after a few minutes even after an instance has been launched:
// Cases:
// - instance is terminated and doesn't have the `runs-on-job-started` tag (due to spot interruption, AWS EC2 error).
// In this case, we need to launch a new instance, so we reschedule the runner.
// - instance is running and has the `runs-on-job-started` tag, which means the runner was stolen by another workflow job.
// In this case, we need to launch a new instance, so we reschedule the runner.
v2.5.0
Section titled “v2.5.0”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.0.yaml
Summary
Summary: Allow to assign an environment name to each RunsOn stack. Allow to specify VPC CIDR block and export outputs to facilitate VPC peering connections. Allow to set custom tags on instances.
Potentially breaking changes
If you have set the Private
parameter to true
in the CloudFormation template, the behaviour has changed:
- The stack will now create only 1 managed NAT gateway (instead of 3) when enabling
Private
mode, to save on costs. - Also, runners will be launched in the private subnets only if the label
private=true
is present in theruns-on:
definition. This way, runners will launch in the public subnets by default, and you can selectively use the private subnet (to get the egress static IP) for specific workflows. This saves on NAT bandwidth costs since most workflows don't need static IP.
Features
- Allow assigning an environment name to a RunsOn stack (default
production
), which can then be targeted by using theenv
label in the workflow. This allows setting up multiple isolated RunsOn stacks to handle environments such asstaging
etc. with different IAM permissions or configurations. Fixes #120. - Allow specifying a custom VPC CIDR block when creating the stack. This helps if you plan on establishing VPC peering connections with your RunsOn runners. Note that updating this parameter for existing stacks is not recommended. You should create a new stack instead, and remove the old one. Fixes #114.
- Provide a CloudFormation template to facilitate the establishment of a VPC peering connection between RunsOn's VPC, and a destination VPC.
- Allow setting custom tags on the instances launched by RunsOn (
RunnerCustomTags
CloudFormation parameter). Fixes #119.
v2.4.0
Section titled “v2.4.0”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.4.0.yaml
Summary
Summary: beta windows support, prometheus metrics, disk statistics in workflow logs.
New features
Prometheus metrics export, every minutes, at
/metrics
(authenticated with Basic Auth and a newServerPassword
CloudFormation parameter).runs_on_ec2_instances_total
, across various labels:image_id
,az
,instance_type
,instance_lifecycle
,instance_state
,repo_full_name
,runner_id
,workflow_job_name
,workflow_job_started
,workflow_name
.runs_on_cloudtrail_events_total
, across labelsevent_name
, forCreateFleet
,RunInstances
, andBidEvictedEvent
events.
scrape_configs: - job_name: "runs_on" metrics_path: /metrics scheme: https basic_auth: username: admin password: YOUR_SERVER_PASSWORD scrape_interval: 60s static_configs: - targets: ["APPRUNNER_ID.APPRUNNER_ZONE.awsapprunner.com"]
Windows support (x64 only for now), with a base image:
image=windows22-base-x64
. Example. For now dependencies have to be installed in your workflow steps, or you need to build a custom AMI based on an official Windows 2022 AMI. Current boot time =~ 2min. This will get better.Display disk details in 'Runner Instance' log group.
Linux
Windows
Misc
- Agent rewrite, to handle multi-platform (Windows, see above).
- Windows EC2Launch logs are available at
C:\runs-on\output.log
. - Ability to specify an alternative public ECR registry for the RunsOn docker image.
v2.3.2
Section titled “v2.3.2”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.3.2.yaml
Summary
Summary: a fix for useless creation of instances when hitting quota errors, reverting the unbounded cpu
and ram
change (from v2.3.0), and CloudWatch agent now streams instance logs into CloudWatch.
Features
The change introduced in
v2.3.0
expanded the instance choice by allowing instances with more CPUs and RAM than specified to be included. This has been reverted to avoid confusion, and to avoid hitting quota limits more frequently. Instead, RunsOn will take the lowest and highest value from thecpu
andram
definitions, and set that as min and max values when requesting an instance. If you want to keep the behaviour introduced inv2.3.0
, you can now simply do e.g.cpu=4+256
and it will evaluate all instances with cpus from 4 to 256. You no longer need to set multiple values likecpu=4+8+16+32...
, since only the min and max values will be used. As another example, settingcpu=4
will only include instance with 4 CPUs, as it was the case beforev2.3.0
.Automatically send cloud-init logs to CloudWatch. Should help a lot with knowing what happened on an instance in case it terminated early. Currently ships
/var/log/cloud-init-output.log
,/var/log/syslog
, and/var/log/cloudwatch-agent.log
. Retention set to 7 days. Requires the CloudWatch agent to be installed on the base AMI (amazon-cloudwatch-agent-ctl
must be in thePATH
).New CloudFormation parameter to enable/disable detailed monitoring for EC2 instances (default:
false
).Add
job_url
to all log messages.Add
runs-on-workflow-run-id
tag on instance, when job has started.All instances will now get a
Name
assigned when the instance starts processing a job from GitHub. Quite useful to monitor at a glance in EC2 UI which instances have started processing jobs.
Fixes
CreateFleet
API can sometimes return an instance, even if errors are present in the response. Checking this fixes an issue that was creating more instances than necessary when hitting e.g. quota errors.- Ensure runner waits up to 10s until all tags have been set on the instance before shutting down.
Misc
- Always prepend preinstall script with
#!/bin/bash -e
, and make RunsOn environment variables accessible.
v2.3.1
Section titled “v2.3.1”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.3.1.yaml
Summary
Summary: Auto-mounting of ephemeral disks, improvements in dangling instance cleanup, better handling of preinstall
.
Features
- Local NVME disks (if any) are now automatically arranged in a RAID0 array, and automatically mounted as the workspace folder for the workflow job (i.e. at
/home/runner/_work
). - Add server-side check and cleanup of dangling instances. If an instance has not been tagged with a job name within 15 minutes of its launch, it will be force-terminated by the RunsOn server. This complements the watchdog of 10 minutes on the agent side, in case the agent cannot properly launch.
- Can now define
preinstall
within a customrunner
definition. This will override any existingpreinstall
from theimage
.
Abort the job if
preinstall
failed, and display its output in the log output of the "Set up runner" step in the GitHub UI.Automatically install the latest version of the runner agent, when using custom images not based on the official images provided by RunsOn.
Fixes
- No longer include bare
metal
instances by default. They are now included only if one of thefamily
types includemetal
in its name. - Reset instance creation timeout when falling back to on-demand pricing.
- Fix ephemeral disk mounts.
- Display preinstall output in
/var/log/cloud-init-output.log
, in addition to logging it in the job log output on GitHub.
Misc
- Internal refactoring for server and agent code.
v2.3.0
Section titled “v2.3.0”Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.3.0.yaml
Summary
Summary: Allow setting custom spot allocation strategy, cpu
and ram
behaviour change, config file is now read from the current branch for private repositories. And 2 new regions!
Potentially breaking changes
Spot allocation strategy now defaults to
price-capacity-optimized
instead ofcapacity-optimized
, which should bring even better cost savings while still ensuring low spot interruption percentage. The downside of that strategy might be a higher likelihood of interruption, but you can now override the strategy (see next section). Also, the next change below might reduce interruption likelihood by automatically expanding the instance pools that EC2 chooses from.No longer specify any max for RAM or CPU when requesting an instance, so that we may get a beefier instance if the spot allocation strategy prioritises it. This could be due to lower price, or due to a less interruption likelihood. This means you no longer need to set
ram=2+4+8+16+...
sinceram=2
will automatically include 2+ GB instances (if you set multiple values: all values except for the first one will be ignored). Same for CPU.
Note that those two changes might be reverted if many users report increased issues with spot interruptions.
Features
- New regions: Ohio (
us-east-2
), and Singapore (ap-southeast-1
). - Can now override the default spot allocation strategy, using either the full strategy name (e.g.
spot=lowest-price
), or its initials (e.g.spot=lp
). Supported allocation strategies:price-capacity-optimized
,lowest-price
,capacity-optimized
. - Automatically mount locally-attached SSD disks if any (for instances types ending with the
d
suffix). Very useful if you require large disk sizes with the fastest speed. - Add new tags
runs-on-workflow-name
andruns-on-workflow-job-name
to the runner instance, once the job has been scheduled on the instance (good for cost allocation, troubleshooting, etc.). - For private repositories, the configuration file will now be read from the current branch.
Fixes
- For non-official images, setup runner user earlier, so that SSH keys can be properly added to that user.
- Set environment variable
RUNNER_TOOL_CACHE
to/opt/hostedtoolcache
, since some third-party actions have this value hardcoded. This is the default value on official runners as well.
Misc
- Send instance timings to telemetry API. This will allow better tracking of boot times across all users and regions.