Metrics

RunsOn allows you to monitor runner usage in two ways:

Prometheus metrics, that you can aggregate using the classic Prometheus + Grafana setup.
CloudWatch metrics, that you can use to set up alarms and alerts.

Runner details

For each runner, you will also find detailed metrics about the EC2 instance, RunsOn installation, and runner timings. This is available when you expand the “Set up job” section in the GitHub Actions UI:

Runner metadata right from the GitHub Actions UI

Prometheus metrics

Metrics in the Prometheus format were introduced in v2.4.0, and are available at the /metrics path from your AppRunner service endpoint. This path is only exposed if the ServerPassword CloudFormation parameter was set.

HTTP Basic Authentication is used to authenticate the request, with the username being admin and the password being the value of the ServerPassword CloudFormation parameter.

Currently, RunsOn exports the following metrics in Prometheus format:

runs_on_ec2_instances_total
runs_on_cloudtrail_events_total

Metrics are exported every minute.

An example scrapping configuration would be:

scrape_configs:
- job_name: "runs_on"
  metrics_path: /metrics
  scheme: https
  basic_auth:
    username: admin
    password: YOUR_SERVER_PASSWORD
  scrape_interval: 60s
  static_configs:
    - targets: ["APPRUNNER_ID.APPRUNNER_ZONE.awsapprunner.com"]

`runs_on_ec2_instances_total`

This metric publishes the number of EC2 instances launched by the current RunsOn installation, across various labels:

Label	Description
`instance_state`	The state of the instance (`pending`, `running`, `shutting-down`, `terminated`, `stopping`, `stopped`)
`instance_type`	The type of the instance (e.g. `m7i-flex.large`)
`instance_lifecycle`	The lifecycle of the instance (`spot` or `on-demand`)
`az`	The Availability Zone of the instance (e.g. `us-east-1a`)
`repo_full_name`	The full name of the repository, in the format `owner/repo`
`workflow_name`	The name of the workflow. Only available once the workflow has started
`workflow_job_started`	Can be used to differentiate between runners that have successfully started (`true`) to process a job, and runners that are still in the process of starting (`false`)
`workflow_job_interrupted`	Whether the runner was interrupted by AWS (`true`), for spot instances
`workflow_job_conclusion`	The conclusion of the job (`success`, `failure`, `cancelled`). Only available once the job has finished
`workflow_job_name`	The name of the job being executed. Only available once the job has started
`workflow_job_id`	The ID of the job being executed. Only available once the job has started
`runner_id`	The ID of the runner (e.g. `2cpu-linux-x64`, or `my-custom-runner`)
`image_id`	The ID of the AMI used to launch the instance (e.g. `ubuntu22-full-x62`, or `my-custom-image`)
`env`	The RunsOn environment name (default: `production`)
`version`	The RunsOn version

`runs_on_cloudtrail_events_total`

This metrics publishes the number of CloudTrail events for specific actions:

Label	Description
`event_name`	The name of the CloudTrail event (`CreateFleet`, `RunInstances`, `BidEvictedEvent`)

Note that CloudTrail events can sometimes be delayed by up to 15 minutes, so you might not see the metrics you are expecting immediately after a change.

This is more useful for keeping an eye on event spikes, or unwanted events count, at a coarse granularity.

Grafana dashboard

A community dashboard is available from https://github.com/runs-on/community/tree/main/dashboards ↗, thanks to Matt Wise ↗:

CloudWatch metrics

RunsOn publishes consumed minutes metrics to CloudWatch, across many dimensions. You can use these metrics to set up alarms and alerts. There are available in the RunsOn CloudWatch namespace.

Note that due to the limited aggregation capabilities of CloudWatch, you might find the Prometheus metrics more useful.