RunsOn RunsOn

OpenTelemetry

Exact OTEL signal behavior for RunsOn

This page is the authoritative OpenTelemetry reference for RunsOn. It explains exactly what RunsOn emits, when it emits it, and where the automatic behavior stops.

Server-side behavior

When OtelExporterEndpoint is configured, the RunsOn server exports OTLP metrics and traces from the server process. That includes the RunsOn-defined server metrics inventory below plus the Go runtime appendix.

Prometheus remains a separate legacy /metrics compatibility surface. It is useful when you already have scrape-based monitoring, but it is not the same transport as OTLP and it does not cover runner-side signals.

Runner-side behavior

Current RunsOn runners already use the local collector for the built-in job-metrics flow. Add extras=otel to a job label, or to the runner spec used by a pool, when you also want runner-side OTLP export.

  • The collector always writes local metrics.jsonl for the built-in job-metrics flow.
  • Remote OTLP export only happens when the runner has OTEL enabled and the stack OTLP endpoint is configured.
  • RunsOn exports the same default runner host-metrics set on Linux and Windows.

Logs

RunsOn forwards the runner bootstrap log file (output.log) through the local collector when runner OTEL is enabled.

This is the RunsOn bootstrap and agent log path. It is not the full GitHub workflow job log stream.

If the EC2 instance log group is configured, the same bootstrap log file also remains available through the instance log group path in CloudWatch.

Traces

RunsOn emits server traces plus agent and instance lifecycle traces.

Per-step job spans are also automatically emitted to your OTLP backend after a job completes. Those spans are emitted from the RunsOn server and you don’t need to set extras=otel on your jobs to get them.

Automatically emitted per-step job spans

Inline job metrics vs OTLP export

The inline job-summary flow described on Job metrics is separate from remote OTLP export. A job can have local summary charts without sending runner metrics to your observability backend.

Generated metric inventory

RunsOn server metrics

MetricKindUnitDescriptionAttributesTransport
runs_on_jobs_totalcounter{job}Total number of jobs by status.conclusion
instance_family
instance_lifecycle
instance_type
interrupted
org
pool_name
pool_type
repo_full_name
status
workflow_name
Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_internal_queue_duration_secondshistogramsTime from job queued in RunsOn to scheduled.conclusion
instance_family
instance_lifecycle
interrupted
pool_name
status
Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_overall_queue_duration_secondshistogramsTime from job queued by GitHub to started.conclusion
instance_family
instance_lifecycle
interrupted
pool_name
status
Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_job_duration_secondshistogramsTime from job started to completed.conclusion
instance_family
instance_lifecycle
interrupted
pool_name
status
Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_pool_instances_totalobservable_gauge{instance}Current number of pool instances by state.installation_id
org
pool_name
state
Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_rate_limiter_tokensobservable_gauge{token}Available tokens in a rate limiter.limiterExported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_rate_limiter_burstobservable_gauge{token}Burst capacity of a rate limiter.limiterExported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_spot_circuit_breaker_activeobservable_gauge1Whether the spot circuit breaker is active; 1 means active and 0 means inactive.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_github_operations_totalcounter{operation}Total number of GitHub API operations by operation name.github_operationExported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_aws_operations_totalcounter{operation}Total number of AWS API operations by operation name.aws_operationExported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_jobs_queue_fetchedhistogram{message}Number of job messages fetched from the jobs queue per poll iteration.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
runs_on_reconciler_backloggauge{job}Current number of jobs waiting to be reconciled.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.

Runner host metrics exported to OTLP by default

Notes:

  • Remote OTLP export requires extras=otel, a configured OTLP endpoint, and JobEnabled=true.
  • Applies to Linux and Windows.
  • Disk metrics are limited to the detected primary/root disk, network metrics to the detected primary network interface, and filesystem metrics are not exported by default.
MetricKindUnitDescriptionAttributes
system.cpu.load_average.15mgauge{thread}Average CPU Load over 15 minutes.-
system.cpu.load_average.1mgauge{thread}Average CPU Load over 1 minute.-
system.cpu.load_average.5mgauge{thread}Average CPU Load over 5 minutes.-
system.cpu.utilizationgauge1Difference in system.cpu.time since the last measurement per logical CPU, divided by the elapsed time (value in interval [0,1]).cpu
state
system.disk.iosumByDisk bytes transferred.device
direction
system.disk.operationssum{operations}Disk operations count.device
direction
system.memory.utilizationgauge1Percentage of memory bytes in use.state
system.network.iosumByThe number of bytes transmitted and received.device
direction

Go runtime appendix

MetricKindUnitDescriptionAttributesTransport
go.config.gogcobservable_up_down_counter%Heap size target percentage configured by the user, otherwise 100.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
go.goroutine.countobservable_up_down_counter{goroutine}Count of live goroutines.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
go.memory.allocatedobservable_counterByMemory allocated to the heap by the application.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
go.memory.allocationsobservable_counter{allocation}Count of allocations to the heap by the application.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
go.memory.gc.goalobservable_up_down_counterByHeap size target for the end of the GC cycle.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
go.memory.limitobservable_up_down_counterByGo runtime memory limit configured by the user, if a limit exists.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
go.memory.usedobservable_up_down_counterByMemory used by the Go runtime.go.memory.typeExported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
go.processor.limitobservable_up_down_counter{thread}The number of OS threads that can execute user-level Go code simultaneously.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.
go.schedule.durationhistogramsThe time goroutines have spent in the scheduler in a runnable state before actually running.-Exported through OTLP metrics when the stack OTLP endpoint is configured.
Exposed on the server Prometheus /metrics surface when the endpoint is enabled.

Generated attribute inventory

RunsOn server metric attributes

ContextAttributeActionDescription
aws_operationAWS API operation name.
conclusionGitHub job conclusion for completed jobs; empty for other states.
github_operationGitHub API operation name.
installation_idGitHub App installation identifier.
instance_familyEC2 instance family derived from the resolved runner instance type.
instance_lifecycleRunner pricing lifecycle such as spot or on-demand.
instance_typeEC2 instance type when a job has been assigned to a runner.
interruptedWhether the job was interrupted before completion.
limiterRate limiter identifier such as github_api or ec2_run.
orgGitHub organization name.
pool_namePool name when the job is served from a pool-backed runner.
pool_typePool standby type when applicable.
repo_full_nameRepository full name in owner/repo form.
stateCurrent pool instance state.
statusJob lifecycle status such as queued, scheduled, in_progress, or completed.
workflow_nameGitHub workflow display name.

RunsOn server resource attributes

ContextAttributeActionDescription
cloud.regionAWS region when configured.
deployment.environmentStack environment name when configured.
service.instance.idServer host identifier when the hostname can be resolved.
service.nameAlways runs-on-server.
service.namespaceStack name when configured.
service.versionRunsOn app version when configured.

Runner metric attributes exported to OTLP by default

ContextAttributeActionDescription
cpucpuLogical CPU number starting at 0.
cpustateBreakdown of CPU usage by type.
diskdeviceName of the disk.
diskdirectionDirection of flow of bytes/operations (read or write).
memorystateBreakdown of memory usage by type.
networkdeviceName of the network interface.
networkdirectionDirection of flow of bytes/operations (receive or transmit).

Runner resource attributes

ContextAttributeActionDescription
service.nameinsertInserted as runs-on-agent when the incoming telemetry does not already define a service name.
service.namespaceinsertInserted as runs-on for runner-generated telemetry when the incoming telemetry does not already define a service namespace.
service.instance.idinsertEC2 instance identifier.
deployment.environmentupsertConfigured RunsOn environment name.
deployment.environment.nameupsertConfigured RunsOn environment name.
stack_nameupsertRunsOn stack name.
regionupsertAWS region.
orgupsertGitHub organization name.
instance_typeupsertEC2 instance type.
instance_lifecycleupsertEC2 pricing lifecycle such as spot or on-demand.
availability_zoneupsertEC2 availability zone.
ami_idupsertAMI identifier used by the runner instance.
pool_nameupsertPool name when the runner comes from a pool.
pool_typeupsertPool standby type when the runner comes from a pool.
repo_full_nameupsertRepository full name in owner/repo form.
workflow_pathupsertGitHub workflow file path.
job_nameupsertGitHub job name.

Go runtime metric attributes

ContextAttributeActionDescription
go-runtimego.memory.typeType of Go runtime memory, currently stack or other.