Skip to content

Metrics

RunsOn allows you to monitor runner usage in two ways:

  • Prometheus metrics, that you can aggregate using the classic Prometheus + Grafana setup.
  • CloudWatch metrics, that you can use to set up alarms and alerts.

For each runner, you will also find detailed metrics about the EC2 instance, RunsOn installation, and runner timings. This is available when you expand the “Set up job” section in the GitHub Actions UI:

Runner metadata right from the GitHub Actions UI
Runner metadata right from the GitHub Actions UI

Prometheus metrics

Metrics in the Prometheus format were introduced in v2.4.0, and are available at the /metrics path from your AppRunner service endpoint. This path is only exposed if the ServerPassword CloudFormation parameter was set.

HTTP Basic Authentication is used to authenticate the request, with the username being admin and the password being the value of the ServerPassword CloudFormation parameter.

Currently, RunsOn exports the following metrics in Prometheus format:

  • runs_on_ec2_instances_total
  • runs_on_cloudtrail_events_total

Metrics are exported every minute.

An example scrapping configuration would be:

scrape_configs:
- job_name: "runs_on"
metrics_path: /metrics
scheme: https
basic_auth:
username: admin
password: YOUR_SERVER_PASSWORD
scrape_interval: 60s
static_configs:
- targets: ["APPRUNNER_ID.APPRUNNER_ZONE.awsapprunner.com"]

runs_on_ec2_instances_total

This metric publishes the number of EC2 instances launched by the current RunsOn installation, across various labels:

  • env: the RunsOn environment name (default: production).
  • version: the RunsOn version.
  • workflow_job_started: can be used to differentiate between runners that have successfully started (true) to process a job, and runners that are still in the process of starting (false).
  • repo_full_name: the full name of the repository, in the format owner/repo.
  • workflow_name: the name of the workflow. Only available once the workflow has started.
  • workflow_job_name: the name of the job being executed. Only available once the job has started.
  • runner_id: the ID of the runner (e.g. 2cpu-linux-x64, or my-custom-runner).
  • image_id: the ID of the AMI used to launch the instance (e.g. ubuntu22-full-x62, or my-custom-image).
  • az: the Availability Zone of the instance (e.g. us-east-1a).
  • instance_type: the type of the instance (e.g. m7i-flex.large).
  • instance_lifecycle: the lifecycle of the instance (spot or on-demand).
  • instance_state: the state of the instance (pending, running, shutting-down, terminated, stopping, stopped).

runs_on_cloudtrail_events_total

This metrics publishes the number of CloudTrail events for the following events (label event_name):

  • CreateFleet
  • RunInstances
  • BidEvictedEvent

Note that CloudTrail events can sometimes be delayed by up to 15 minutes, so you might not see the metrics you are expecting immediately after a change.

This is more useful for keeping an eye on event spikes, or unwanted events count, at a coarse granularity.

CloudWatch metrics

RunsOn publishes consumed minutes metrics to CloudWatch, across many dimensions. You can use these metrics to set up alarms and alerts. There are available in the RunsOn CloudWatch namespace.

Note that due to the limited aggregation capabilities of CloudWatch, you might find the Prometheus metrics more useful.