Metrics
RunsOn allows you to monitor runner usage in two ways:
- Prometheus metrics, that you can aggregate using the classic Prometheus + Grafana setup.
- CloudWatch metrics, that you can use to set up alarms and alerts.
For each runner, you will also find detailed metrics about the EC2 instance, RunsOn installation, and runner timings. This is available when you expand the “Set up job” section in the GitHub Actions UI:
Prometheus metrics
Metrics in the Prometheus format were introduced in v2.4.0, and are available at the /metrics
path from your AppRunner service endpoint. This path is only exposed if the ServerPassword
CloudFormation parameter was set.
HTTP Basic Authentication is used to authenticate the request, with the username being admin
and the password being the value of the ServerPassword
CloudFormation parameter.
Currently, RunsOn exports the following metrics in Prometheus format:
runs_on_ec2_instances_total
runs_on_cloudtrail_events_total
Metrics are exported every minute.
An example scrapping configuration would be:
runs_on_ec2_instances_total
This metric publishes the number of EC2 instances launched by the current RunsOn installation, across various labels:
env
: the RunsOn environment name (default:production
).version
: the RunsOn version.workflow_job_started
: can be used to differentiate between runners that have successfully started (true
) to process a job, and runners that are still in the process of starting (false
).repo_full_name
: the full name of the repository, in the formatowner/repo
.workflow_name
: the name of the workflow. Only available once the workflow has started.workflow_job_name
: the name of the job being executed. Only available once the job has started.runner_id
: the ID of the runner (e.g.2cpu-linux-x64
, ormy-custom-runner
).image_id
: the ID of the AMI used to launch the instance (e.g.ubuntu22-full-x62
, ormy-custom-image
).az
: the Availability Zone of the instance (e.g.us-east-1a
).instance_type
: the type of the instance (e.g.m7i-flex.large
).instance_lifecycle
: the lifecycle of the instance (spot
oron-demand
).instance_state
: the state of the instance (pending
,running
,shutting-down
,terminated
,stopping
,stopped
).
runs_on_cloudtrail_events_total
This metrics publishes the number of CloudTrail events for the following events (label event_name
):
CreateFleet
RunInstances
BidEvictedEvent
Note that CloudTrail events can sometimes be delayed by up to 15 minutes, so you might not see the metrics you are expecting immediately after a change.
This is more useful for keeping an eye on event spikes, or unwanted events count, at a coarse granularity.
CloudWatch metrics
RunsOn publishes consumed minutes metrics to CloudWatch, across many dimensions. You can use these metrics to set up alarms and alerts. There are available in the RunsOn
CloudWatch namespace.
Note that due to the limited aggregation capabilities of CloudWatch, you might find the Prometheus metrics more useful.