Skip to content

Metrics

RunsOn allows you to monitor runner usage in two ways:

  • Prometheus metrics, that you can aggregate using the classic Prometheus + Grafana setup.
  • CloudWatch metrics, that you can use to set up alarms and alerts.

For each runner, you will also find detailed metrics about the EC2 instance, RunsOn installation, and runner timings. This is available when you expand the “Set up job” section in the GitHub Actions UI:

Runner metadata right from the GitHub Actions UI
Runner metadata right from the GitHub Actions UI

Metrics in the Prometheus format were introduced in v2.4.0, and are available at the /metrics path from your AppRunner service endpoint. This path is only exposed if the ServerPassword CloudFormation parameter was set.

HTTP Basic Authentication is used to authenticate the request, with the username being admin and the password being the value of the ServerPassword CloudFormation parameter.

Currently, RunsOn exports the following metrics in Prometheus format:

  • runs_on_ec2_instances_total
  • runs_on_cloudtrail_events_total

Metrics are exported every minute.

An example scrapping configuration would be:

scrape_configs:
- job_name: "runs_on"
metrics_path: /metrics
scheme: https
basic_auth:
username: admin
password: YOUR_SERVER_PASSWORD
scrape_interval: 60s
static_configs:
- targets: ["APPRUNNER_ID.APPRUNNER_ZONE.awsapprunner.com"]

This metric publishes the number of EC2 instances launched by the current RunsOn installation, across various labels:

LabelDescription
instance_stateThe state of the instance (pending, running, shutting-down, terminated, stopping, stopped)
instance_typeThe type of the instance (e.g. m7i-flex.large)
instance_lifecycleThe lifecycle of the instance (spot or on-demand)
azThe Availability Zone of the instance (e.g. us-east-1a)
repo_full_nameThe full name of the repository, in the format owner/repo
workflow_nameThe name of the workflow. Only available once the workflow has started
workflow_job_startedCan be used to differentiate between runners that have successfully started (true) to process a job, and runners that are still in the process of starting (false)
workflow_job_interruptedWhether the runner was interrupted by AWS (true), for spot instances
workflow_job_conclusionThe conclusion of the job (success, failure, cancelled). Only available once the job has finished
workflow_job_nameThe name of the job being executed. Only available once the job has started
workflow_job_idThe ID of the job being executed. Only available once the job has started
runner_idThe ID of the runner (e.g. 2cpu-linux-x64, or my-custom-runner)
image_idThe ID of the AMI used to launch the instance (e.g. ubuntu22-full-x62, or my-custom-image)
envThe RunsOn environment name (default: production)
versionThe RunsOn version

This metrics publishes the number of CloudTrail events for specific actions:

LabelDescription
event_nameThe name of the CloudTrail event (CreateFleet, RunInstances, BidEvictedEvent)

Note that CloudTrail events can sometimes be delayed by up to 15 minutes, so you might not see the metrics you are expecting immediately after a change.

This is more useful for keeping an eye on event spikes, or unwanted events count, at a coarse granularity.

A community dashboard is available from https://github.com/runs-on/community/tree/main/dashboards, thanks to Matt Wise:

Grafana dashboard
Grafana dashboard

RunsOn publishes consumed minutes metrics to CloudWatch, across many dimensions. You can use these metrics to set up alarms and alerts. There are available in the RunsOn CloudWatch namespace.

Note that due to the limited aggregation capabilities of CloudWatch, you might find the Prometheus metrics more useful.