Skip to content

Best Practices

Spot instances can be interrupted when AWS needs the capacity back, which can disrupt your workflows. Here are strategies to minimize spot interruptions:

The most effective way to reduce spot interruptions is to diversify your instance type selection:

.github/runs-on.yml
runners:
my-custom-runner:
# Include multiple families instead of restricting to one.
family: ["m7", "c7", "r7"]
# Use ram and cpu options to restrict the instance type among the wide family range.
ram: [8, 16] # Specify a range instead of a single value
cpu: [2, 8] # Allows more flexibility

Or using job label syntax:

runs-on: runs-on=${{ github.run_id }}/family=m7+c7+r7/ram=8+16/cpu=2+8

Use Capacity-Optimized Allocation Strategy

Section titled “Use Capacity-Optimized Allocation Strategy”

Switch from the default price-focused strategy to capacity-optimized for better stability:

.github/runs-on.yml
runners:
my-custom-runner:
spot: capacity-optimized # or spot: co

Or per job:

runs-on: runs-on=${{ github.run_id }}/spot=capacity-optimized

To learn more about the allocation strategies, see Spot Allocation Strategies.

The spot circuit breaker automatically switches to on-demand instances when spot interruptions exceed a threshold over a period of time. The format is COUNT/WINDOW_MINUTES/RECOVERY_MINUTES. You can adjust its sensitivity:

  • COUNT: Number of interruptions before triggering
  • WINDOW_MINUTES: Time window in minutes to count interruptions
  • RECOVERY_MINUTES: Time before trying spot again

Increasing the COUNT value makes the circuit breaker less sensitive, allowing more interruptions before switching to on-demand.

Spot availability varies by region. If you consistently face interruptions:

  1. Try deploying RunsOn in a different AWS region with better spot capacity
  2. Monitor spot interruption rates across regions
  3. Use multi-region deployments for critical workloads (one RunsOn stack per region, and use the region job label to select the region)

For workloads where interruptions are particularly disruptive:

  • Consider using a wider range of instance types
  • Accept slightly higher costs for better stability with capacity-optimized allocation strategy
  • For critical jobs, use on-demand instances instead of spot (spot=false).

By implementing these strategies, you can significantly reduce spot interruptions while still maintaining cost efficiency.

How do I maximize savings when using spot?

Section titled “How do I maximize savings when using spot?”

If your workflows do not require high-performance runners, use the spot=lowest-price allocation strategy in your configuration. This prioritizes cost over performance by selecting the cheapest available spot instance that meets your requirements.

# In your runs-on.yml file
runners:
my-custom-runner:
spot: lowest-price

Or per job:

jobs:
build:
runs-on: runs-on=${{ github.run_id }}/spot=lowest-price

What instance families should I include for better cost efficiency?

Section titled “What instance families should I include for better cost efficiency?”

Include a wide range of instance families, especially the more common ones like m7 variants:

family: ["m7a", "m7i", "r7i", "c7i", "c7a", "r7a", "t3", "t3a"]

Or using the job label syntax:

runs-on: runs-on=${{ github.run_id }}/family=m7a+m7i+r7i+c7i+c7a+r7a+t3+t3a

How should I specify RAM and CPU requirements?

Section titled “How should I specify RAM and CPU requirements?”

Use ranges instead of listing every possible value:

# Instead of listing every value
ram: [4, 512] # This specifies a range from 4GB to 512GB
cpu: [2, 128] # This specifies a range from 2 to 128 cores

Or using the job label syntax:

runs-on: runs-on=${{ github.run_id }}/family=m7a+m7i+m7i-flex+r7i+c7i+c7a+r7a+t3+t3a/ram=4+512/cpu=2+128

Can I set default spot configurations for all jobs?

Section titled “Can I set default spot configurations for all jobs?”

Yes, you can specify spot configuration in your runs-on.yml file rather than in each individual job:

.github/runs-on.yml
runners:
my-custom-runner:
spot: lowest-price
family: ["m7a", "m7i", "r7i", "c7i", "c7a", "r7a", "t3", "t3a"]
ram: [4, 512]
cpu: [2, 128]

This can be further simplified using wildcard syntax:

.github/runs-on.yml
runners:
my-custom-runner:
spot: lowest-price
family: ["m7*", "c7*", "r7*", "t3*"]
ram: [4, 512]
cpu: [2, 128]

AWS will automatically select the cheapest instance at the time of launch that meets your requirements.

What results can I expect from these optimizations?

Section titled “What results can I expect from these optimizations?”

Users have reported significant cost reductions - in some cases reducing daily charges by 60% by implementing the lowest-price strategy and including more instance families like the m7 series.