CloudWatch
AWS-native observability for RunsOn — the Flex CloudWatch dashboard for control-plane health, plus per-job CloudWatch metrics via runs-on/action@v2.
RunsOn integrates with CloudWatch in two places: a built-in dashboard for control-plane health, and per-job runner metrics collected through runs-on/action@v2 ↗. Both keep observability inside your AWS account, with no extra backend to run.
Flex CloudWatch dashboard
RunsOn can create a CloudWatch dashboard for the deployed stack. Use it for a quick, AWS-native view of whether GitHub webhooks are arriving, jobs are being scheduled, queues are draining, and runner launches are behaving normally.
The dashboard answers operational questions such as:
- Is the internal webhook/job queue growing or draining?
- How many runners have been scheduled, and how does that trend over time?
- What are the internal vs overall queue-duration percentiles (P50/P90)?
- Are there recent error messages in the worker logs?
- Is the spot circuit breaker active, and are there recent spot interruptions?
It is not meant to replace a full observability backend. The dashboard mostly uses structured CloudWatch logs and AWS service metrics, so it is less flexible than querying OTLP metrics and traces in Grafana, Datadog, SigNoz, New Relic, or another OTLP-compatible backend.
The dashboard
RunsOn always creates a CloudWatch dashboard for the deployed stack (there is no enable/disable parameter). It is created in the AWS account and region where the stack is deployed. Open CloudWatch → Dashboards and look for the dashboard created for your RunsOn stack.
Runner CloudWatch metrics via runs-on/action@v2
For per-job resource metrics — CPU, memory, disk, network, and I/O — runs-on/action@v2 ↗ can collect them through CloudWatch and render ASCII charts in the Post Run runs-on/action@v2 step:
- Add
runs-on/action@v2↗ to the job. - Request one or more metric groups with the
metrics:input. - The action configures CloudWatch metrics collection.
- In
Post Run runs-on/action@v2, the action queries CloudWatch and renders ASCII charts in that post step.
This path does not generate or upload metrics.jsonl, and does not work with container-based jobs (unlike the built-in runner metrics).
Supported metric groups
| Metric group | Available metrics | What it helps you answer |
|---|---|---|
| CPU | usage_user, usage_system | Is the runner CPU-bound? |
| Network | bytes_recv, bytes_sent | Is the job moving a lot of data? |
| Memory | used_percent | Is the runner memory-constrained? |
| Disk | used_percent, inodes_used | Is the workspace or filesystem filling up? |
| I/O | io_time, reads, writes | Is the job bottlenecked on disk activity? |
Configure the action
jobs: build: runs-on: runs-on=${{ github.run_id }}/runner=2cpu-linux-x64 steps: - uses: runs-on/action@v2 with: metrics: cpu,network,memory,disk,io
- uses: actions/checkout@v6 - name: Build application run: npm run buildYou can also request a smaller subset (metrics: cpu,memory). Example output from the post step:
📈 Metrics (since 2025-06-30T14:18:56Z):
📊 CPU User: 100.0 ┤ 87.5 ┤ ╭─╮╭───────────╮ 75.0 ┤ ╭╯ ╰╯ │ 62.5 ┤ ╭╯ ╰╮ 50.0 ┤ │ │ 37.5 ┤ │ ╰╮ 25.0 ┤ ╭╯ │ 12.5 ┤ ╭─────────╮╭─────╯ ╰╮ 0.0 ┼────────────────────╯ ╰╯ ╰ CPU User (Percent) Stats: min:0.0 avg:29.0 max:93.4 PercentFor the other per-job metric paths — built-in inline charts and runner OpenTelemetry export — see OpenTelemetry.
Configure OpenTelemetry export
The control plane can also push its own logs, metrics, and traces to any OTLP-compatible backend. For Flex, set the OTLP endpoint (and optional headers) on the stack:
OtelExporterEndpoint— the OTLP endpoint to export to.OtelExporterHeaders— authentication headers for the backend.
Once an endpoint is configured, server-side export is automatic. For the full list of server metrics, attributes, and resource attributes — plus runner-side export — see the OpenTelemetry reference.
What to use for each problem
| Problem | Use |
|---|---|
| Check whether the control plane is healthy | The Flex CloudWatch dashboard |
| Get per-job CPU/memory/disk/network charts | The runs-on/action@v2 metrics above |
| Debug one GitHub Actions job | Runner metrics and the job log |
| Send server logs, metrics, and traces to an observability backend | OpenTelemetry |
| Get failure notifications by email or Slack | Alerts |
| Track daily spend and budget alarms | Cost control |
How it relates to runner metrics
The stack dashboard is about the RunsOn control plane: incoming GitHub events, scheduling, queues, API pressure, and AWS service behavior.
Runner metrics are about a single runner while it executes a job: CPU, memory, disk, network, I/O, and runner metadata.
If a job is slow or oversized, start with runner metrics. If many jobs are delayed, failing to launch, or stuck behind queue/API pressure, start with the dashboard.