Skip to content

Troubleshooting

Runners can fail to start for a variety of reasons. IF an error is raised while attempting to start a workflow, RunsOn will alert you by email (assuming you have confirmed the SNS Topic notification when you setup the stack).

Quick checks

  • https://www.githubstatus.com - see if GitHub Action is having issues
  • you only have one RunsOn stack installed in your AWS account
  • you are using the latest version

Long queuing time for some workflows

RunsOn runners consistently start in ~45s for x64, and ~30s for arm64. If you are seeing abnormal queuing times, it may be the case that the runner started for your workflow job has been stolen by another worklow job.

For instance, if two workflow jobs A and B with the same runs-on labels are queued at the same time, the runner started for job A may actually start processing job B (since runner A labels matches those for job B), while job A has to wait for runner B to come up online.

To avoid this and help with debugging, it is best practice to ensure that each workflow job gets a more unique label. This can be achieved by assigning the current workflow run id as an additional label:

runs-on: runs-on,runner=2cpu-linux,run-id=${{ github.run_id }}

You can also inspect the RunsOn logs, and search for a specific run ID value to see if something went wrong. All log messages from RunsOn are tagged with the run ID (since v1.7.2).

View the application logs

It can be useful to access the logs of RunsOn to see more details about the issues. This can either be done from CloudWatch UI, or with awslogs command:

pip install awslogs

Now replace the log group (/aws/apprunner/...) with yours, and you can do:

AWS_PROFILE=your-aws-profile awslogs get --aws-region eu-west-1 \
/aws/apprunner/RunsOnService-6Gwxsz1vjfMD/356d75069c2c4ec89b0e452c51778ce8/application \
-wGS -s 30m --timestamp

View Cloudtrail logs

If you’re getting errors about request limit exceeded or quota issues, have a look at the Cloudtrail events, and especially for the RunInstances API event, to see if you are getting rate limited.

For instance in eu-west-1, the Cloudtrail events can be accessed at:

https://eu-west-1.console.aws.amazon.com/cloudtrailv2/home?region=eu-west-1#/events?ReadOnly=false

Failed to create instance

This error can happen due to multiple reasons:

PendingVerification

⚠️ Failed to create instance with type c7a.4xlarge: PendingVerification: Your request for accessing resources in this region is being validated, and you will not be able to launch additional resources in this region until the validation is complete. We will notify you by email once your request has been validated. While normally resolved within minutes, please allow up to 4 hours for this process to complete. If the issue still persists, then open a support case. [https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/create?issueType=customer-service&serviceCode=account-management&categoryCode=account-verification]

This is usually resolved within a few minutes (automatically). So just retry the workflow a few minutes later and it should work. Otherwise open a support case.

RequestLimitExceeded

This usually happens if you are launching instances too quickly compared to the allowed rate limit for your account.

The rate limit mechanism is detailed in https://docs.aws.amazon.com/AWSEC2/latest/APIReference/throttling.html, but this should not longer happen since v1.6.2.

RunsOn now defaults to the lowest rate limit (2 RunInstances API call/s max).

If your account has a higher quota for those API calls, you can modify the queue size in the CloudFormation stack parameters to take advantage of it.