RunsOn RunsOn

v2.10.0

View on GitHub → CloudFormation template →

Summary

Re-architects job handling with a DynamoDB-backed workflow job store, new jobs/GitHub queues and reconciler, enhanced GitHub/spec/pool/label logic, and restores Prometheus/OTEL /metrics endpoint.

Spotlight

  • Fix numerous bugs for pools (still in beta).
  • New reconciliation loop should ensure that no queued job is left pending.
  • FIFO semantics for SQS queues now enforced for job scheduling, even under high load.
  • Repository config (runs-on.yml) is now validated and warnings are emitted when an error is found. Validator is also available using the CLI.
  • You should see lower GitHub API token usage, with auto-disablement of some low-priority features (e.g. unregistering runners) over high-impact ones (e.g. registering runners). Also new metricsa are available to track GitHub API operations.
  • Re-enable a /metrics endpoint (prometheus format) if you prefer polling for metrics instead of sending to an OTEL collector.
2025-11-10-000580-RunsOn OTEL Dashboard

Details

  • Infrastructure (CloudFormation):
    • Queues: Add RunsOnQueueJobs and RunsOnQueueGithub FIFO queues (with DLQs) and wire into env vars, IAM, and outputs.
    • DynamoDB: Add RunsOnWorkflowJobsTable. Should stay well below free tier usage for 90% of users.
    • App Runner/IAM: Add RUNS_ON_QUEUE_JOBS, RUNS_ON_QUEUE_GITHUB, RUNS_ON_WORKFLOW_JOBS_TABLE, RUNS_ON_SERVER_PASSWORD, RUNS_ON_LOGGER_LEVEL (dev/v2.10.0) env vars; grant s3:DeleteObject on S3Bucket/runs-on/db/*.
    • Parameters: Add Environment to main group; add ServerPassword; add LoggerLevel (dev/v2.10.0); refine OtelExporterEndpoint and RunnerConfigAutoExtendsFrom descriptions.
    • Outputs: Export new queues and RunsOnWorkflowJobsTable.
  • Server/Architecture:
    • Replace legacy queue processors with new jobs and github queues (processJobsQueue, processGitHubQueue); add reconciliation loop processWorkflowJobsReconciliation.
    • Introduce DynamoDB-backed WorkflowJobsStore (WorkflowJobRecord), instance attachment, next-check indexing, and recent repo discovery.
    • Refactor webhook handling to context-aware flow; persist job payloads; schedule via job IDs.
  • GitHub Integration:
    • Add stale-while-revalidate caches, repo metadata/collaborators retrieval, global/local/composite config loaders, and repo listing per installation.
  • Runner/Specs:
    • SpecResolver now validates config, resolves from labels/repo config, and normalizes private/ssh.
  • Pools:
    • Validate pool names; include name in SHA; improved schedule evaluator; parse/detach timestamps; explicit excess instance termination.
  • Labels:
    • Robust parsing (trim/control chars), support runs-on= run id, and auto-assign dependabot pool.
  • Metrics/Endpoints:
    • Add Prometheus exporter and GitHub operation counter; improve OTEL config; /metrics endpoint with basic auth.
  • Housekeeping/Cleanup:
    • Batch terminate instances/fleets; honor runs-on-terminate tag.

Fixes #400, fixes #402, fixes #395, fixes #386.