Changelog
Versions
Section titled âVersionsâDetails
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.9.2.yaml
Summary
- Fix S3 rate-limit initialization and restore correct values.
- Improve Slack webhook templates by @cfsnate.
- Pool
environment
field renamed toenv
(to be coherent with the naming in runner labels).environment
is still supported but will be removed in next minor release. Also if noenv
specified, it defaults toproduction
. - Fix dependabot handling.
- Fix regression for
/var/lib/docker
bind mount that was active even when no ephemeral disk is present (would cause issues if you had embedded docker images on a custom AMI). - Try to circumvent a GitHub bug where in rare occurrences, a workflow_job webhook is received with empty
runs-on:
labels (labels: []
). In that case, manually refresh the job details from GitHub API before proceeding.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.9.1.yaml
Summary
Note 2025-10-21: please use v2.9.2 instead, since it includes important fixes.
- Fix pool overflow not picking the correct runner and image spec.
- Warm and hot pools are available for Windows as well!
- Fix github rate-limit refresh.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.9.0.yaml
Summary
This is a large release, with many internal and external changes. Please review the first section below carefully.
Note 2025-10-17: please use v2.9.2 instead, since it includes important fixes.
Potentially breaking changes
- Update default linux image to ubuntu24. Set
image=ubuntu22-full-x64
orimage=ubuntu22-full-arm64
if you want to keep using Ubuntu22. - When a job is canceled due to a spot interruption, all the failed jobs from that failed workflow will get retried, instead of only the first job interrupted. Fixes #192.
- Prometheus metrics endpoint removed, along with the
ServerPassword
stack parameter. RunsOn now ships with OTEL integration. Fixes #322. The/metrics
endpoint anyway had a long-standing issue where some prometheus scrapers were unable to reach the AppRunner endpoint due to how the Envoy proxy from AWS handles requests. disk=large
anddisk=default
labels are deprecated. If present, they will be automatically translated into the newvolume
label, but once you have adopted v2.9.0, you should update your workflows for future upgrades.RunnerLargeDiskDeviceName
andRunnerDefaultDiskDeviceName
are removed (now always use the AMI root volume device name).- Add stack parameter
RunnerConfigAutoExtendsFrom
to always force a specific value for repository configuration_extends
directive (even if no local config file exists). Fixes #366. Note that it defaults to.github-private
, meaning that if you leave that default, RunsOn will always attempt to load the config file from that repo as a base configuration. Set it to.
(only extend from current repo_extends
directive) to keep the previous behavior. This has been a long requested feature (and source of confusion) and is what new users expect, so this is why the breaking setting is enabled by default. - Custom tag precedence is now: stack custom runner tags < custom runner tags < repository custom runner tags. This allows to set default tags at the stack level, which can be overriden by runner-level tags, but in the end repo-level tags always taking precedence (if set) to make sure repo admins can control the final tag value when needed.
Deprecations (please send feedback!)
disk
label support is going to be removed in the next minor version (replaced byvolume
label, which is much more flexible).RunnerLargeVolumeThroughput
andRunnerLargeDiskSize
are deprecated stack parameters.
Also, now that external networking is supported, next version will just set sane defaults for some VPC features when using embedded networking. As such, those parameters will be removed:
VpcFlowLogFormat
: [DEPRECATED, use external networking if you need to fine-tune this].VpcFlowLogS3BucketArn
: [DEPRECATED, use external networking if you need to fine-tune this]VpcFlowLogRetentionInDays
: [DEPRECATED, use external networking if you need to fine-tune this]VpcCidrSubnetBits
: [DEPRECATED, use external networking if you need to fine-tune this]
Then, DefaultAdmins
will be removed, since it's just better for admin-level people to use SSM to log into the runners if needed:
DefaultAdmins
: [DEPRECATED, prefer to use SSM for admin access].
Finally, I don't think ECInstanceDetailedMonitoring
is useful since default cloudwatch metrics are useless anyway, and you're better off using the new runs-on/action
metrics:
ECInstanceDetailedMonitoring
: [DEPRECATED. See https://runs-on.com/monitoring/job-metrics/#performance-metrics for better performance metrics].
Now that native Slack webhook integration is supported, I believe we can also remove the AlertTopicSubscriptionHttpsEndpoint
parameter, which was originally introduced for that use case (but required an adapter in between). Please reach out if you think we should keep it.
Warm pools (BETA)
RunsOn can now operate pools of stopped or hot instances, which means pick-up times will be improved.
See https://github.com/runs-on/runs-on/blob/main/adrs/20250727-warm-pools.md for all details.
Volume overrides
Runners can now override the default volume settings directly within labels or through the configuration file.
Examples:
# full
runs-on: runner=2cpu-linux-x64/volume=gp3:80g:125mbps:3000iops
# partial
runs-on: runner=2cpu-linux-x64/volume=80g:250mbps
GitHub webhook redelivery on failures
RunsOn now ships with a background job to check (every 5min) for failed webhook deliveries from the github side. If it finds some (matching the current stack labels), it will attempt to redeliver them once. This is especially useful under very high load, as AppRunner can sometimes rate-limit incoming webhooks when GitHub sends a burst of webhooks all at once.
You'll get alerted (over SNS, Slack, etc.) if failed webhooks have been redelivered. Cloudwatch dashboard also has a widget showing recent runs and redeliveries (if any).
Runner details
- Add original labels
- Add pool details (if any)
Click to view example
Slack integration
Can now define a slack webhook URL (AlertTopicSlackWebhookUrl
stack parameter), so that alerts also get sent there.
OTEL integration
Can now pass OTEL endpoint and headers (OtelExporterOtlpEndpoint
, OtelExporterOtlpHeader
). Only HTTP transport enabled for now. Metrics will be shipped there. Example dashboard below using Signoz:
Details of all new metrics and logs
Job Metrics
runs_on_jobs_total
(Counter)
Total number of jobs by status.
Attributes:
status
: Job status (queued
,scheduled
,in_progress
,completed
)repo_full_name
: Repository full name (e.g.,owner/repo
)workflow_name
: GitHub workflow nameinstance_type
: EC2 instance type (e.g.,t3.medium
) (optional, only forscheduled
status)instance_lifecycle
: Instance lifecycle (spot
oron-demand
) (optional, only forscheduled
status)pool_name
: Pool name if scheduled from a pool (optional, only when scheduled via pool)interrupted
: Whether the job was interrupted (bool) (optional, only when true)org
: GitHub organization nameinstallation_id
: GitHub App installation IDstack_name
: Stack name (optional, when provided in JobEvent)region
: AWS region (optional, when provided in JobEvent)conclusion
: Job conclusion forcompleted
status (success
,failure
,cancelled
,skipped
)
Examples:
# Scheduled status (has instance_type and instance_lifecycle)
runs_on_jobs_total{status="scheduled",repo_full_name="acme/api",workflow_name="CI",instance_type="t3.medium",instance_lifecycle="spot",pool_name="default",org="acme",installation_id=12345,stack_name="runs-on-prod",region="us-east-1"} 42
# Completed status (no instance_type or instance_lifecycle)
runs_on_jobs_total{status="completed",repo_full_name="acme/api",workflow_name="CI",pool_name="default",conclusion="success",org="acme",installation_id=12345,stack_name="runs-on-prod",region="us-east-1"} 42
runs_on_internal_queue_duration_seconds
(Histogram)
Time from job queued in RunsOn to scheduled (internal queue time). This measures how long the job spends in RunsOn's internal queue before an instance is scheduled.
Attributes: Same as runs_on_jobs_total
Buckets: Default OTEL histogram buckets
runs_on_overall_queue_duration_seconds
(Histogram)
Time from job queued by GitHub to started (overall queue time). This measures the total time from when GitHub queues the job to when it actually starts running, including instance launch and runner bootstrap.
Attributes: Same as runs_on_jobs_total
Buckets: Default OTEL histogram buckets
runs_on_job_duration_seconds
(Histogram)
Time from job started to completed.
Attributes: Same as runs_on_jobs_total
Buckets: Default OTEL histogram buckets
Pool Metrics
runs_on_pool_instances_total
(Observable Gauge)
Number of pool instances by state. This is a pull-based metric that reports current state.
Attributes:
pool_name
: Pool namestate
: Instance state (running
,stopped
,pending
,terminating
)installation_id
: GitHub App installation IDorg
: GitHub organization name
Example:
runs_on_pool_instances_total{pool_name="default",state="running",installation_id=12345,org="acme"} 5
runs_on_pool_instances_total{pool_name="default",state="stopped",installation_id=12345,org="acme"} 10
Rate Limiter Metrics
runs_on_rate_limiter_tokens
(Observable Gauge)
Available tokens in rate limiter. This is a pull-based metric that reports current state.
Attributes:
limiter
: Rate limiter name (e.g.,github_api
,ec2_api
)
Example:
runs_on_rate_limiter_tokens{limiter="github_api"} 4500.5
runs_on_rate_limiter_burst
(Observable Gauge)
Burst capacity of rate limiter. This is a pull-based metric that reports current state.
Attributes:
limiter
: Rate limiter name
Example:
runs_on_rate_limiter_burst{limiter="github_api"} 5000
Spot Circuit Breaker Metrics
runs_on_spot_circuit_breaker_active
(Observable Gauge)
Whether spot circuit breaker is currently active. This is a pull-based metric that reports current state.
Values:
1
: Circuit breaker is active (spot instances disabled)0
: Circuit breaker is inactive (spot instances enabled)
Example:
runs_on_spot_circuit_breaker_active{} 0
Go Runtime Metrics
The metrics package automatically instruments Go runtime metrics via go.opentelemetry.io/contrib/instrumentation/runtime
:
process.runtime.go.mem.heap_alloc
process.runtime.go.mem.heap_idle
process.runtime.go.mem.heap_inuse
process.runtime.go.gc.count
process.runtime.go.goroutines.count
- And more...
These metrics include the standard service.name="runs-on-server"
attribute.
Resource Attributes
All metrics include these resource attributes:
Attribute | Description | Example |
---|---|---|
service.name |
Service name (always runs-on-server ) |
runs-on-server |
app.version |
Application version (if configured) | v2.9.0 |
app.environment |
Environment name (if configured) | production |
stack_name |
Stack name (if configured) | runs-on-prod |
region |
AWS region (if configured) | us-east-1 |
Structured Logs
The metrics package emits periodic structured logs (JSON) containing snapshots of all metrics.
Log Types
Job Summary (metric_type=jobs_summary
)
Cumulative job counts since server start.
{
"metric_type": "jobs_summary",
"queued": 1234,
"scheduled": 1200,
"in_progress": 34,
"completed": 1150,
"interrupted": 16
}
Note: The interrupted
counter tracks jobs that were interrupted (e.g., by spot interruptions), but jobs are recorded with their final status (e.g., completed
) and the interrupted
attribute set to true
.
Job Event (metric_type=job_event
)
Individual job lifecycle events (emitted immediately, not periodic).
{
"metric_type": "job_event",
"status": "completed",
"conclusion": "success",
"repo_full_name": "acme/api",
"workflow_name": "CI",
"instance_type": "t3.medium",
"instance_lifecycle": "spot",
"pool_name": "default",
"interrupted": true,
"internal_queue_duration_seconds": 12.5,
"overall_queue_duration_seconds": 45.2,
"job_duration_seconds": 180.3
}
Note: instance_type
, instance_lifecycle
, pool_name
, and interrupted
fields are only included when available/applicable.
Pool Instances (metric_type=pool_instances
)
Current pool instance counts by state.
{
"metric_type": "pool_instances",
"installation_id": 12345,
"org": "acme",
"pool_name": "default",
"running": 5,
"stopped": 10,
"pending": 2
}
Rate Limiter (metric_type=rate_limiter
)
Current rate limiter state.
{
"metric_type": "rate_limiter",
"limiter": "github_api",
"tokens": 4500.5,
"burst": 5000
}
Spot Circuit Breaker (metric_type=spot_circuit_breaker
)
Current circuit breaker state.
{
"metric_type": "spot_circuit_breaker",
"active": false,
"interruption_count": 42
}
Spot Interruption (metric_type=spot_interruption
)
Individual spot interruption events (emitted immediately, not periodic).
{
"metric_type": "spot_interruption",
"interruption_time": "2025-10-10T14:30:00Z",
"trip_count": 3,
"recovery_minutes": 15,
"circuit_breaker_active": false,
"active_until": "2025-10-10T14:45:00Z",
"instance_id": "i-1234567890abcdef0",
"job_id": "987654321",
"job_name": "build",
"job_url": "https://github.com/owner/repo/actions/runs/123456789/job/987654321",
"repo_full_name": "owner/repo"
}
Note: active_until
is only included when the circuit breaker is active. Job details (instance_id
, job_id
, job_name
, job_url
, repo_full_name
) are only included when available.
Pre/Post custom job hooks
You can now launch custom scripts within the "Set up runner" and "Complete runner" sections of a workflow. If /runs-on/pre.custom.sh
or /runs-on/post.custom.sh
scripts are found, the RunsOn agent will execute them in their respective job section. They are executed after the RunsOn-specific scripts, and RunsOn will fail the step if those custom scripts fail. See https://docs.github.com/en/actions/how-tos/manage-runners/self-hosted-runners/run-scripts for more details.
Misc
- Improve failure message when invalid runner spec (missing family). Fixes #343.
- Fix permission issue with Docker and ECR login in preinstall scripts on instances with local disks. Fixes #362.
- Auto-resize Windows disks. Fixes #369.
- Properly disable all ipv4 public addresses whenever launching in private subnets. Previously this was only done when
Private=only
stack parameter was set, leading to increased costs when running mixed networking mode (public + private runners allowed) stacks. - When instance received a spot interruption warning, let AWS perform the termination so that we don't get billed if runtime was <1h. Fixes #365.
- Surface job error after all schedule attempts have been exhausted. Fixes #357.
- Fix SSH setup issues on AlmaLinux images. #330.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.9.yaml
Summary
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.8.yaml
Summary
Increase Docker ECR setup timeout to 2min (previously 20s, but could lead to authentication errors).
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.7.yaml
Summary
Small fix for the CloudFormation template, when using the external
networking stack and not passing any public (or private) subnet IDs.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.6.yaml
Summary
Bug fixes, and first iteration on integrated CloudWatch dashboard for people managing the stack.
What's changed
- Better handling of environment variable display in the "Set up runner" step. Fixes #325.
- Allow
ExternalVpcPublicSubnetIds
to be left empty when usingPrivate=only
mode. - Cleanup delete markers and aborted multipart uploads. Fixes #329.
- Lower MinValue for disk size to 10GB. Fixes #336.
- Reformat error and cost report subjects, limit to max 100 chars. Fixes #340.
- Windows: make user-data run on every reboot
- Add stack parameter
EnableDashboard
(default: false) to allow creation of CloudWatch dashboard - Properly override platform and arch based on retrieved image details (if ami id is provided). Previously you could have windows images getting the linux user-data script if you were just providing the AMI ID (i.e. not using an image spec definition in the config file).
- Make windows agent resilient to already existing runner user.
- Do not retry terminating a job if invalid instance id given.
- Update dependencies.
Beta: integrated CloudWatch dashboard
You can now enable the creation of a CloudWatch dashboard. This is early days, but it can already display widgets for:
- total runners scheduled for current period
- runners scheduled over time
- status of ec2 rate limiters + github api tokens left
- last 20 error messages for current period (can expand)
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.5.yaml
Summary
Fix buildkit gha exporter, better user error reporting, AppRunner VPC connector integration.
What's changed
Report non-retryable user errors directly in GitHub: whenever a job can't be started for user reasons (e.g. bad image, bad runner definition, etc.), RunsOn will now spawn a default runner that will fail at the "Set up runner" step, with an error message explaining why. This will help surface issues. Fixes #307.
Fix issue with
type=gha
buildkit exporter for docker layers. Fixes #328.Automatically enable the AppRunner VPC connector when
Private
mode is active, so that all AppRunner egress traffic (for the RunsOn orchestrator) goes through the private subnet(s) NAT gateways or equivalent. This means the AppRunner service will use the same static IP(s) as the runners, so that you can whitelist the AppRunner service on your GHES or GitHub Enterprise installation if needed. All ingress traffic is still publicly allowed and handled by AWS.Telemetry: send values for
networking_stack
(embedded or external), andextras
. Will help better understand how RunsOn is setup and which extra features are most used.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.4.yaml
Summary
Integrated CPU/Memory/Disk/Network monitoring, integrated job-level cost reporting, official snapshot action release, and many QoL improvements.
Spotlight: Monitoring improvements
- Allow to send metrics to CWAgent namespace. This allows runs-on/action@v2 to send and graph metrics right within your job output. For instance:
ð Disk Writes:
5973 âĪ ââŪ
5500 âĪ ââŊâ°âŪ
5028 âžâŪ ââŊ â°âŪ âââââŪ â
4555 âĪâ°âŪ ââŊ â°âŪ ââââŊ â°ââŪ âââŊ
4083 âĪ â°âŪ ââŊ â°ââŪ ââââŊ â°âŪ âââŊ
3610 âĪ â°ââŪ ââŊ â°âŪ ââââŊ â°ââŪ âââŊ
3138 âĪ â°âŪ ââŊ â°âŪ âââââŊ â°ââââŊ
2665 âĪ â°âŪ ââŊ â°ââŊ
2193 âĪ â°âââŊ
Disk Writes (Ops/s)
Stats: min:2040.0 avg:4180.9 max:6026.0 Ops/s
Create resource group for EC2 instances on CloudWatch. This means you can go to the CloudWatch EC2 Automatic dashboard, select your resource group (named after your RunsOn stack) and get a high-level overview of metrics for all your runner instances.
Allow instance role to enable detailed monitoring on demand (not used for now, but might be an option of runs-on/action.
Spotlight: Costs computation
- The runs-on/action@v2 now automatically computes the costs associated for each job, and displays the results right within your job logs. You can also choose to display them as a job summary.
Spotlight: Block-level snapshots
- The runs-on/snapshot@v1 action is available and can be used to save and restore entire folders between job executions, at a much faster speed (for long jobs) than other methods relying on compression and export to S3 or other.
Stack improvements
- Allow to override the max runner time limit. Fixes #320.
- Add support for permission boundary for SchedulerInvokeRole. Fixes #315.
- Add AWS account-id and region to emails. Fixes #298.
- Remove explicit
PublicAccessBlockConfiguration
declaration since some SCP policies can incorrectly flag thes3:PutBucketPublicAccessBlock
action. This is the default for new buckets anyway. - Add ECR Full Access managed policy to obtain higher ECR Public Rate Limits.
- Make bootstrapping work on more distros.
- Allow RunsOn to auto-create spot role if absent.
- Add health checks for email notification subscription, and ec2 spot role.
Misc
- Remove legacy (and unused)
.env
loading. - Update go to 1.24.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.3.yaml
Summary
Huge improvements to tagging, magic cache is now even faster, and bug fix for jobs tied to environments with no approval required.
What's changed
QoL improvements
Check if the spot role exists before starting RunsOn service (preflight check 2 from the installation guide). If not, alert the user over the SNS topic.
Cleanup all dangling instances, irrespective of RunsOn version.
Rewrote magic cache to more efficiently stream uploads when
actions/cache
is the client. For bigger (>1GiB) payloads, there should be a very noticeable improvement.If
SSHAllowed
is set tofalse
at the stack level, discard anyssh=true
value coming from label or repo config. Fixes #310.
Improvements to tagging
Pass all custom tags to volumes, in addition to instances, when creating the runner. Fixes #264.
Allow to set additional custom tags using a custom property in the GitHub settings of a repository. If a custom property with name
runs-on-custom-tags
exists, RunsOn will parse it in the same way as the stack-level custom tags, and apply them to the instance and volumes. Fixes #297.For instance: if the value for property
runs-on-custom-tags
is set tokey1=val1,key2=val2
then instances and volumes will get 2 new tags (key1, key2) with their corresponding values.Same restrictions than stack-level tags apply. Stack-level tags take precedence over tags set in the custom property, and tags set in custom properties takes precedence over custom runner tags defined in the
.github/runs-on.yml
configuration.Pass custom tags and default branch to runner config. And write config in
/runs-on/config.json
(linux), orC:\runs-on\config.json
(Windows). Config can then be read by actions / scripts etc. to access all runner details easily.
Bug fixes
- Fix the race-condition that could lead to 2 instances being started when handling jobs tied to a deployment that does not require approval.
Misc
- Add goroutine to cleanup dangling volumes and snapshots (prepare for block-level snapshots).
- Register
waiting
,in_progress
, andcompleted
webhook payloads in S3 (in addition toqueued
).
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.2.yaml
Summary
Support for EFS, TMPFS, and ECR ephemeral registry for fast docker builds. Also some bug fixes.
What's changed
EFS
- Embedded networking stack can now create an Elastic File System (EFS), and runners will auto-mount it at
/mnt/efs
if theextras
label includeefs
. Useful to share artefacts across job runs, with classic filesystem primitives.
jobs:
with-efs:
runs-on: runs-on=${{ github.run_id }},runner=2cpu-linux-x64,extras=efs
steps:
- run: df -ah /mnt/efs
# 127.0.0.1:/ 8.0E 35G 8.0E 1% /mnt/efs
ð Example use case for maintaining mirrors
For instance this can be used to maintain local mirrors of very large github repositories and avoid long checkout times for every job:env:
MIRRORS: "https://github.com/PostHog/posthog.git"
# can be ${{ github.ref }} if same repo as the workflow
REF: main
jobs:
with-efs:
runs-on: runs-on=${{ github.run_id }},runner=2cpu-linux-x64,extras=efs
steps:
- name: Setup / Refresh mirrors
run: |
for MIRROR in ${{ env.MIRRORS }}; do
full_repo_name=$(echo $MIRROR | cut -d/ -f4-)
MIRROR_DIR=/mnt/efs/mirrors/$full_repo_name
mkdir -p "$(dirname $MIRROR_DIR)"
test -d "${MIRROR_DIR}" || git clone --mirror ${MIRROR/https:\/\//https:\/\/x-access-token:${{ secrets.GITHUB_TOKEN }}@} "${MIRROR_DIR}"
( cd "$MIRROR_DIR" && \
git remote set-url origin ${MIRROR/https:\/\//https:\/\/x-access-token:${{ secrets.GITHUB_TOKEN }}@} && \
git fetch origin ${{ env.REF }} )
done
- name: Checkout from mirror
run: |
git clone file:///mnt/efs/mirrors/PostHog/posthog.git --branch ${{ env.REF }} --single-branch --depth 1 upstream
Ephemeral registry
- Support for an Ephemeral ECR registry: can now automatically create an ECR repository that can act as an ephemeral registry for pulling/pushing images and cache layers from your runners. Especially useful with the
type=registry
buildkit cache instruction. If theextras
label includesecr-cache
, the runners will automatically setup docker credentials for that registry at the start of the job.
jobs:
ecr-cache:
runs-on: runs-on=${{ github.run_id }},runner=2cpu-linux-x64,extras=ecr-cache
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v4
env:
TAG: ${{ env.RUNS_ON_ECR_CACHE }}:my-app-latest
with:
context: .
push: true
tags: ${{ env.TAG }}
cache-from: type=registry,ref=${{ env.TAG }}
cache-to: type=registry,ref=${{ env.TAG }} }},mode=max,compression=zstd,compression-level=22
Tmpfs
Support for setting up a tmpfs
volume (size: 100% of available RAM, so only to be used on high-memory instances), and binding the /tmp
, /home/runner
, and /var/lib/docker
folders on it. /tmp
and /home/runner
are mounted as overlays, preserving their existing content.
Can speed up some IO-intensive workflows. Note that if tmpfs
is active, instances with ephemeral disks won't have those mounted since it would conflict with the tmpfs
volume.
jobs:
with-tmpfs:
runs-on: runs-on=${{ github.run_id }},family=r7,ram=16,extras=tmpfs
steps:
- run: df -ah /mnt/tmpfs
# tmpfs 16G 724K 16G 1% /mnt/tmpfs
- run: df -ah /home/runner
# overlay 16G 724K 16G 1% /home/runner
- run: df -ah /tmp
# overlay 16G 724K 16G 1% /tmp
- run: df -ah /var/lib/docker
# tmpfs 16G 724K 16G 1% /var/lib/docker
You can obviously combine options, i.e. extras=efs+tmpfs+ecr-cache+s3-cache
is a valid label ð
Instance-storage mounting changes
Until now, when an instance has locally attached NVMe SSDs available, they would be automatically formatted and mounted so that /var/lib/docker
and /home/runner/_work
directories would end up on the local disks. Since a lot of stuff (caches etc.) seem to end up within the /home/runner
folder itself, the agent now uses the same strategy as for the new tmpfs
mounts above (i.e. the whole /home/runner
folder is mounted as an overlay on the local disk volume, as well as the /tmp
folder. /var/lib/docker
remains mounted as a normal filesystem on the local disk volume). Fixes #284.
Misc
- Move all RunsOn-specific config files into
/runs-on
folder on Linux. More coherent with Windows (C:\runs-on
), and avoids polluting/opt
folder. - Fix
app_version
in logs (was previously empty string due to incorrect env variable being used in v2.8.1). - Fix "Require any Amazon EC2 launch template not to auto-assign public IP addresses to network interfaces" from AWS Control Tower. When the
Private
mode is set toonly
, no longer enable public ip auto-assignment in the launch templates. Thanks @temap!
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.8.1.yaml
Summary
A large release: can now use external networking stack ; enable encryption on all S3 buckets ; lots of quality of life improvements and bug fixes ; halve Windows boot times and enable Cloudwatch agent monitoring. Be sure to read the upgrade notes.
What's changed
Networking
- Can now reuse existing networking stack. If
NetworkingStack
stack parameter is set toexternal
instead ofembedded
. Fixes #198, fixes #265, fixes #230 (community-provided networking stack can provide this feature).
- Some not-so-useful stack outputs have been removed. Some outputs may be
-
if using an external VPC.
Caching
- Fix invalid cache key restoration for Magic Cache. Thanks @erikburt from ChainlinkLabs for the troubleshooting.
Security
Enable server-side encryption using AWS-managed KMS key on all S3 buckets. Fixes #276.
No longer expose JIT token in cloud-init-output logs. The token is no longer valid after a job is run, but still.
QoL improvements
Add
AppDebug
(true or false) stack parameter, which allows to disable the auto-shutdown of runners when the bootstrap fails. Useful to investigate what is going on when the runner initializes.Add
AppCustomPolicy
stack parameter: Optional managed IAM Policy ARN to assign to the App runner service role. Can be used to e.g. allow access to KMS decryption keys for AMIs. Thanks @dsme94!Add
AppGithubApiStrategy
(normal or conservative) stack parameter to opt into minimizing GitHub API usage. If set toconservative
, runners won't be automatically unregistered in GitHub internal database (GitHub will still clean them up after 24h). This helps for users with very large number (20k+) of jobs launched every day. Fixes #285.Now bootstraps runners using runs-on/bootstrap binary, preinstalled on official RunsOn images (faster and more extensible).
On spot interruption, give more time to the job to possibly complete before shutdown is triggered. Shutdown is now triggered 20s before the expected time sent by AWS, instead of 15 seconds after the notification is received. Fixes #277.
Windows
Shaved about 50s from Windows boot times: SSH is no longer automatically installed on Windows (SSM agent is available now), and no longer using
Invoke-WebRequest
helped a lot (TIL).CloudWatch agent is automatically installed on Windows AMIs, and EC2Launch logs are shipped to CloudWatch (same naming as for Linux runners: e.g.
LOG_GROUP_NAME/INSTANCE_ID/cloud-init-output.log
). Also added support forroc connect
on Windows AMIs in the RunsOn CLI.
Bug fixes
Fix for invalid CreateTags requests - Fixes #288.
Fix for invalid EC2 rate-limiter being used when uploading user-data file to S3. Fixes #286 .
Adjust ownership rule for S3 bucket logging, from
BucketOwnerPreferred
toBucketOwnerEnforced
. Fixes #291.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.7.2.yaml
Summary
GHES support is now available. Allow to specify a custom expiration for objects in the cache bucket.
What's changed
- GHES support is now available. Fixes #250.
- Add
S3CacheExpirationInDays
stack parameter. Fixes #179. - Tag launch templates. Fixes #264.
- Pin launch template version to the specific version active at the time RunsOn service wqas deployed. Fixes #274.
- Tag instances with
runs-on-is-ghes
andruns-on-integrations-active
. - Tag instances with
InspectorEc2Exclusion
to avoid SSM inspector scans on running instances. Possibly fixes #242.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.7.1.yaml
Summary
Hotfix: fix for disk=large
handling.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.7.0.yaml
Summary
A few minor breaking changes related to VPC flow logs and hdd
label. Plus many fixes.
Breaking changes
This is a minor release, so this comes with the following breaking changes. Please review your CloudFormation parameters and runner configuration accordingly when updating:
- Fix for #258. VPC Flow Logs are now only enabled if the
VpcFlowLogFormat
is set to a non-empty value. To enable, and use the default format (as it were before if you didn't specify a value), specifydefault
. - Remove support for the deprecated
hdd
job label. Ensure all your workflows and repository configuration (.github/runs-on.yml
) do not use this label. If it is still set, it will have no effect and the default runner configuration will be used for disk sizing. You must now use thedisk=default
ordisk=large
label instead.
What's changed
- Fix for S3 server access logging. Fixes #241.
- Allows specifying the root volume name. Fixes #207.
- Expose
RUNS_ON_AWS_AZ
andRUNS_ON_INSTANCE_LAUNCHED_AT
environment variables to jobs. - Properly set
RUNNER_TOOL_CACHE
ðĪĶ, so that somesetup-*
actions can properly use the hosted toolcache on the VM. - Add policy to allow instances to describe their tags. Means we no longer need to enable
InstanceMetadataTags
for the instances. Cost Allocation Tag and Runner Tags can now contain slashes in their keys. - Add
runs_on_spot_circuit_breaker_active
prometheus metric (1=active, 0=inactive). Fixes #271. - Ensure we don't try to auto-retry after spot termination if the workflow run has already been manually re-attempted. Fixes #263.
- Fixes typo - Fixes #248.
- Scope the minutes alarm on the stack name. Also add the
StackName
dimension on all metrics. Fixes #235. - (alpha, not fully functional yet) Support for GitHub Enterprise Server (GHES) installations.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.8.yaml
Summary
Hotfix for CreateFleet IdempotentParameterMismatch errors, as well as Magic Cache support for newer buildx versions.
What's changed
- Fixes #251:
IdempotentParameterMismatch
error. - Fix Magic Cache for newer buildx versions. No longer need to set
version=1
incache-from
andcache-to
. - Fixes #249: Add cancelled to the list of conclusion statuses that can trigger an auto-retry.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.7.yaml
Summary
New spot circuit breaker for snoozing spot requests if too many interruptions detected. Monitoring improvements. StepSecurity integration, and more.
What's changed
Spot circuit breaker
- Allow to switch to on-demand requests if spot interruption frequency is too high over a defined time interval. Fixes #226.
For instance, if SpotCircuitBreaker
is set to 2/30/60
, it means that after at least 2 interruptions in the last 30 minutes, RunsOn will switch to on-demand requests for the next 60 minutes.
Monitoring
- Add workflow job conclusion to prometheus labels. Fixes #178. Also add
job_conclusion
andrun_attempt
to all log lines. - Support SQS queue oldest message age alarms. Helps with compliance and to detect whether RunsOn has issues dequeuing messages fast enough. Fixes #228.
- Use scheduled event to compute and send cost reports at midnight UTC. Fixes #216.
Native integration with StepSecurity
- Support StepSecurity integration with new images:
jobs:
job-with-stepsecurity:
runs-on: "runs-on=${{ github.run_id }}/runner=2cpu-linux-x64/image=ubuntu24-stepsecurity-x64"
steps:
- name: External call
run: curl https://google.com
Documentation: https://runs-on.com/integrations/stepsecurity/
Misc
- Reduce agent binary size.
- Update Go dependencies.
- Allow injection of custom runner agent (internal testing only).
- Remove magic cache ON annotation. Fixes #234.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.6.yaml
Summary
Fix VpcEndpoints
stack parameter.
What's changed
With VpcEndpoints
enabled, the CloudFormation template was incorrectly assigning interface endpoints to both public and private subnets, while an interface endpoint can only be defined once per AZ (and only makes sense for private subnets anyway).
Thanks again to Commonwealth Fusion Systems for their quick feedback and help!
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.5.yaml
Summary
Optimized GPU images, new VpcEndpoints
stack parameter, ability to specify custom instance tags for custom runners.
Note: there appears to be some issues with the new VPC endpoints. I'm on it! If you need that feature, please hold on to your current version of RunsOn.
What's Changed
- New GPU images
ubuntu22-gpu-x64
andubuntu24-gpu-x64
: 1-1 compatibility with GitHub base images + NVidia GPU drivers, CUDA toolkit, and container toolkit. - Add new
VpcEndpoints
stack parameter (fixes #213), and reorganize template params. Note that the EC2 VPC endpoint was previously automatically created whenPrivate
mode was enabled. This is no longer the case, so make sure you select the VPC endpoints that you need when you update your CloudFormation stack. - Suspend versioning for cache bucket (fixes #191).
- Allow to specify instance tags for runners (fixes #205). Tag keys can't start with
runs-on-
prefix, and key and values will be sanitized according to AWS rules.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.4.yaml
Summary
CLI 0.0.1 released, fix for Magic Cache, fleet objects deletion.
What's changed
- CLI released: https://github.com/runs-on/cli. Allows to easily view logs (both server logs and cloud-init logs) for a workflow job by just pasting its GitHub URL or ID. Also allows easy connection to a runner through SSM.
- Fix race-condition in Magic Cache (fixes #209).
- Delete the fleet instead of just the instance (fixes #217).
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.3.yaml
Summary
Fix magic cache handling of actions/upload-artifact. Prepare for RunsOn CLI.
What's changed
- Store instance id assigned to job (once job has started) in the main S3 bucket (under
/runs-on/db/jobs/JOB_ID/instance-id
), as well as the payload for the workflow_jobqueued
event. Will be used for #201. - Fix magic cache for cache keys with slashes inside.
- Make magic cache play nice with
actions/upload-artifact
. For that you must addruns-on/action@v1
in your workflows. Fixes #197. - Documentation for magic cache at https://runs-on.com/caching/magic-cache/
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.2.yaml
Summary
Magic transparent cache for dependencies and docker layers. SSM support for logging into runner instances. And more.
What's changed
- BETA - Transparent S3-backed caching for ALL actions that depend on the official cache toolkit from GitHub. Fixes https://github.com/runs-on/cache/issues/23 and actually makes the runs-on/cache action redundant in the context of RunsOn runners. For now, the magic caching is only enabled with the
extras=s3-cache
job label.
jobs:
look-ma-no-cache-config:
runs-on: "runs-on=${{github.run_id}}/runner=2cpu-linux-x64/extras=s3-cache"
steps:
# standard action is supported, no need to use `runs-on/cache@v4`
- uses: actions/cache@v4
with:
path: my-path
key: my-key
# third-party actions that depend on official toolkit (99%) are supported as well
- uses: ruby/setup-ruby@v1
with:
bundler-cache: true
- BETA - Transparent S3-backed caching for Docker layers when using
cache-to: type=gha
/cache-from: type=gha
. For now, the magic caching is only enabled with theextras=s3-cache
job label.
jobs:
look-ma-no-cache-config:
runs-on: "runs-on=${{github.run_id}}/runner=2cpu-linux-x64/extras=s3-cache"
# BEFORE
- name: "Build and push image (explicit s3 config)"
uses: docker/build-push-action@v4
with:
tags: test
cache-from: type=s3,blobs_prefix=cache/docker-s3/,manifests_prefix=cache/docker-s3/,region=${{ env.RUNS_ON_AWS_REGION }},bucket=${{ env.RUNS_ON_S3_BUCKET_CACHE }}
cache-to: type=s3,blobs_prefix=cache/docker-s3/,manifests_prefix=cache/docker-s3/,region=${{ env.RUNS_ON_AWS_REGION }},bucket=${{ env.RUNS_ON_S3_BUCKET_CACHE }},mode=max
# AFTER
- name: "Build and push image (type=gha, automgically switched to S3)"
uses: docker/build-push-action@v4
with:
tags: test
cache-from: type=gha
cache-to: type=gha,mode=max
- Assign
AmazonSSMManagedInstanceCore
policy to EC2 instances, so that one can easily connect to the runner instance with SSM. Fixes #129.
AWS_PROFILE=YOUR_PROFILE aws ssm start-session --target INSTANCE_ID --reason "testing ssm"
- Allow to inject additional environment variables from the preinstall step, by exposing a
$GITHUB_ENV
variable that you can write to. The variables will automatically be made available to the job steps. Fixes #188.
runners:
preinstall-with-env:
image: ubuntu22-full-arm64
family: ["c7g"]
preinstall: |
echo "Adding a custom env var..."
echo "MY_CUSTOM_VAR=my_custom_value" >> $GITHUB_ENV
Support
preinstall
for Windows runners.Expose
RunsOnServiceArn
as output, so that one can use it to build the CloudWatch log paths. Fixes #184.Do not send the cost allocation tag warning if the latest cost report was non-zero. Fixes #187.
Add
Ec2LogRetentionInDays
stack parameter. Fixes #189.Allow to read license key from SSM. Fixes #176.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.1.yaml
Summary
New stack parameters and best practices compliance changes. No longer defaults to fetching global config when a local repo config is not found. Improve housekeeping to handle an additional AWS internal error case when launching an instance.
What's changed
- Add parameter to enable/disable IPv6:
Ipv6Enabled
. Default is nowfalse
, which is a change from previous versions where IPv6 was always enabled. The reason for that is that it looks like docker pulls will go through IPv6 IPs, and for some reason they are getting rate-limited much faster than on IPv4. Will have to dig a bit deeper into that. Fixes #177. - Add parameter to disable the inbound SSH rule in the default security group for runners:
SSHAllowed
. Default istrue
. Fixes #174. Fixes #159. - Add
VpcFlowLogRetentionInDays
stack parameter. Fixes #180. - No longer defaults to fetching global config when a local repo config is not found. The current behaviour was a bit broken with the caching mechanism, and lead to confusion. Let's make the behaviour explicit by requiring a local repo config file, with an explicit
_extends
directive. I understand this is a bit cumbersome if you have many repositories, but I think it's also nice to be able to inspect which repositories are inheriting from the global config. I'm introducing this change as part of a patch release because the current behaviour was already broken on v2.6.0. - Housekeeping: Detect AWS server issue that sometimes leaves instances in pending state, in which case RunsOn will terminate the current instance, and reschedule.
- Enable versioning on all S3 buckets. Fixes #181.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.0.yaml
Summary
Auto-retry mechanism for spot interruptions, SingleAZ or MultiAZ NAT gateways, and more!
What's changed
- Spot workflows are now retried once with an on-demand instance if interrupted. Fixes #160. Requires a permission update (
write
permission for Actions instead ofread
) for existing installations. You should receive an email with instructions after upgrading. Also addruns-on-workflow-job-interrupted=true
to the instance tags if the spot instance was interrupted. - Add new label
retry
, with possible valuesretry=when-interrupted
(default for spot), andretry=false
to opt out of any auto-retry (useful for non-idempotent jobs). - Add
runs-on-workflow-job-id
to the instance tags once the job has started. Also add it to prometheus metric labels. - Rename tag
runs-on-job-started
=>runs-on-workflow-job-started
- Allow to use 1 NAT gateway per AZ instead of a single one for all. Fixes #165.
- Add optional
VpcCidrSubnetBits
,DefaultPermissionBoundaryArn
,VpcFlowLogFormat
, andVpcFlowLogS3BucketAr
parameters, so that users can get more conformant stacks compared to their internal settings. - Set RunsOn env variables on Windows.
New Contributors
- @chris9979 made their first contribution in https://github.com/runs-on/runs-on/pull/167
- @colstrom made their first contribution in https://github.com/runs-on/runs-on/pull/169
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.9.yaml
Summary
Fix GitHub webhook custom_properties
handling when non-string values.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.8.yaml
Summary
Revert x/time dependency to v0.6.0 since v0.7.0 introduced a breaking change for rate-limits when using a zero limit.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.7.yaml
Summary
Add Private=only mode, make EBS encryption opt-in, introduce disk
label. Plus fixes and minor improvements.
Note: please use v2.5.8+ because this version embeds a dependency upgrade for the rate-limit library, which introduced a regression.
What's changed
- Update github go library to fix issue with custom properties.
- Make EBS encryption opt-in, and specify default encryption key (fixes #152).
- Add
Private=only
mode for the CloudFormation stack, so that runners are forbidden to launch in a public subnet. Fixes #150. - Disable automatic public IP assignment in public subnets when
Private=only
is set for the stack (helps with conformance). - Remove
HousekeepingEnabled
stack parameter. Housekeeping is now always enabled. - No longer display
EgressStaticIp
in job logs since we don't know which one the runner will end up using.
Deprecations
- Introduce
disk=default
ordisk=large
label to simplify disk size selection based on the runner volumes defined in the RunsOn CloudFormation stack.hdd
is now deprecated and will be removed in a next non-patch version.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.6.yaml
Summary
Enable IPv6 for runners. Allow to specify multiple static IPs for the managed NAT gateway. Allow filtering images based on tags. A lot of changes (again) around GitHub rate-limit handling and housekeeping mechanism.
New features
- Enable IPv6 for runners (fixes #142). An IPv6 is attached for both public and private runners, with an egress ipv6 (free) gateway for private instances.
- Allow to specify multiple static IPs for the managed NAT gateway (fixes #139). By default up to 2 are possible, and up to 8 when a quota increase is requested. This helps if you are launching a large number of runners in private subnets, and some external service rate-limits you based on the IP.
- Allow filtering images based on a tag, in addition to the name wildcard (e.g.
is-production-ready=true
). Example :
# .github/runs-on.yml
images:
custom:
owner: "123456789"
name: "my-org/my-image-name-*"
arch: x64
platform: linux
tags:
# filter with specific value
is-production-ready: "true"
# allow any value
other-tag: "*"
- Automatically bind-mount
/var/lib/docker
on the ephemeral instance storage, if any. Fixes #144.
Bug fixes
- Escape shell special characters in env file values.
- If a matching AMI cannot be found, do not retry and alert on first error.
- Do not attempt to retry job if generated fleet params configuration is incorrect.
- Abort early if workflow run status cannot be checked.
Fixes to avoid GitHub rate-limit issues
- No longer attempt to reschedule jobs where a runner theft is suspected. Instead log a warning message telling users to make sure their jobs have unique enough labels. In some cases this was triggering useless reschedules due to GitHub not reflecting the job state quickly enough.
- Fix too many GitHub calls when fetching repo config from an
extends
attribute (cache it). - No longer unregister runners from GitHub if API credit is lower than 2500. They will be removed by GitHub 24h later anyway.
- Reorganize rate-limiters, increase
DELAY_SECONDS_FOR_CHECK_BACK
to 180s instead of 120s. Enable github rate-limiter, and set burst to the current number of remaining tokens. - Only attempt to finalize a job once at most. Instance will auto-terminate anyway so at worst we lose the job usage metrics in CloudWatch. But at least we don't eat into the GitHub / EC2 credits.
- Set housekeeping and termination queue sizes to 1 to reduce their impact on GitHub API credits.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.5.yaml
Summary
Strengthen CF template configuration to better conform to AWS guidelines. Bug fixes.
What's changed
- Verify that generated JIT token has at least one char.
- Do not attempt to retry runner creation when we know the original request is invalid (e.g. invalid runner configuration due to mismatched labels etc.)
- Strengthen CF template configuration to better conform to AWS guidelines.
- Make sure empty admin values are ignored.
- If no repository config found, cache the result for 1 minute to avoid hammering GitHub API.
Details
- Released on: .
- For more details: view release notes on GitHub.
- CloudFormation template: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.5.4.yaml
Summary
New ubuntu24 images, new housekeeping task to auto-restart instances that failed to launch, new always-on Private setting, additional runner details in logs, and more.
Notable changes (from v2.5.0 to v2.5.4)
- Add ubuntu24 official images:
ubuntu24-full-x64
andubuntu24-full-arm64
. Private
CloudFormation parameter now acceptsalways
as value, in which case the runners will always launch in the private subnets by default (unless opt-out withprivate=false
).- Display GitHub current rate-limits in logs (search for
tokens
). - Add 'Private' dimension to cloudwatch stats.
- Add
Environment
,IsPrivate
, andStaticIp
(if IsPrivate) to runner details (in Setup job logs) - Increase frequency for spot interruption polling + add logs.
- Conform to AWS spec when sanitizing custom tags (key and value). Fixes #125.
- Add housekeeping task to handle edge cases where a job is still seen as queued by GitHub after a few minutes even after an instance has been launched.
- Allow to disable new housekeeping mechanism.
- Properly tag instance volumes with cost allocation tag. Cost report email will likely go up.
- Display
app_environment
andapp_stack_name
in logs. - Attempt to fix rare preinstall issue ending up with "text file busy".
- Unregister runner from GitHub when job is completed (i.e. do not wait for auto-expiration since it does not seem that reliable).
Experimental
- Bring back support for single string label, using
/
as the separator instead of,
. e.g.runs-on: runs-on/runner=2cpu-linux-x64/other=tag
will work. This simplifies passing a runs-on specification as input to dependent workflows. If you have multiple RunsOn stacks, make sure they are all upgraded to this version before using this new syntax in workflows.
Internal
- Fix issue with
private
attribute not being properly loaded from the repository configuration file. - Switch to semaphores for processing the 3 queues.
- Check workflow run status before scheduling job.
- Add termination queue.
- Update GitHub App (for new installations) to listen for workflow_run events (not used yet, but will be soon).
- Upgrade default runner version when no runner is preinstalled.