Best Practices

Optimizing for Less Spot Interruption

Spot instances can be interrupted when AWS needs the capacity back, which can disrupt your workflows. Here are strategies to minimize spot interruptions:

Diversify Instance Types

The most effective way to reduce spot interruptions is to diversify your instance type selection:

runners:
  my-custom-runner:
    # Include multiple families instead of restricting to one.
    family: ["m7", "c7", "r7"]
    # Use ram and cpu options to restrict the instance type among the wide family range.
    ram: [8, 16]  # Specify a range instead of a single value
    cpu: [2, 8]   # Allows more flexibility

Or using job label syntax:

runs-on: runs-on=${{ github.run_id }}/family=m7+c7+r7/ram=8+16/cpu=2+8

Use Capacity-Optimized Allocation Strategy

Switch from the default price-focused strategy to capacity-optimized for better stability:

runners:
  my-custom-runner:
    spot: capacity-optimized  # or spot: co

Or per job:

runs-on: runs-on=${{ github.run_id }}/spot=capacity-optimized

To learn more about the allocation strategies, see Spot Allocation Strategies.

Configure the Spot Circuit Breaker

The spot circuit breaker automatically switches to on-demand instances when spot interruptions exceed a threshold over a period of time. The format is COUNT/WINDOW_MINUTES/RECOVERY_MINUTES. You can adjust its sensitivity:

COUNT: Number of interruptions before triggering
WINDOW_MINUTES: Time window in minutes to count interruptions
RECOVERY_MINUTES: Time before trying spot again

Increasing the COUNT value makes the circuit breaker less sensitive, allowing more interruptions before switching to on-demand.

Consider Regional Availability

Spot availability varies by region. If you consistently face interruptions:

Try deploying RunsOn in a different AWS region with better spot capacity
Monitor spot interruption rates across regions
Use multi-region deployments for critical workloads (one RunsOn stack per region, and use the region job label to select the region)

Balance Cost vs. Stability

For workloads where interruptions are particularly disruptive:

Consider using a wider range of instance types
Accept slightly higher costs for better stability with capacity-optimized allocation strategy
For critical jobs, use on-demand instances instead of spot (spot=false).

By implementing these strategies, you can significantly reduce spot interruptions while still maintaining cost efficiency.

Cost reduction

How do I maximize savings when using spot?

If your workflows do not require high-performance runners, use the spot=lowest-price allocation strategy in your configuration. This prioritizes cost over performance by selecting the cheapest available spot instance that meets your requirements.

# In your runs-on.yml file
runners:
  my-custom-runner:
    spot: lowest-price

Or per job:

jobs:
  build:
    runs-on: runs-on=${{ github.run_id }}/spot=lowest-price

What instance families should I include for better cost efficiency?

Include a wide range of instance families, especially the more common ones like m7 variants:

family: ["m7a", "m7i", "r7i", "c7i", "c7a", "r7a", "t3", "t3a"]

Or using the job label syntax:

runs-on: runs-on=${{ github.run_id }}/family=m7a+m7i+r7i+c7i+c7a+r7a+t3+t3a

How should I specify RAM and CPU requirements?

Use ranges instead of listing every possible value:

# Instead of listing every value
ram: [4, 512]  # This specifies a range from 4GB to 512GB
cpu: [2, 128]  # This specifies a range from 2 to 128 cores

Or using the job label syntax:

runs-on: runs-on=${{ github.run_id }}/family=m7a+m7i+m7i-flex+r7i+c7i+c7a+r7a+t3+t3a/ram=4+512/cpu=2+128

Can I set default spot configurations for all jobs?

Yes, you can specify spot configuration in your runs-on.yml file rather than in each individual job:

runners:
  my-custom-runner:
    spot: lowest-price
    family: ["m7a", "m7i", "r7i", "c7i", "c7a", "r7a", "t3", "t3a"]
    ram: [4, 512]
    cpu: [2, 128]

This can be further simplified using wildcard syntax:

runners:
  my-custom-runner:
    spot: lowest-price
    family: ["m7*", "c7*", "r7*", "t3*"]
    ram: [4, 512]
    cpu: [2, 128]

AWS will automatically select the cheapest instance at the time of launch that meets your requirements.

What results can I expect from these optimizations?

Users have reported significant cost reductions - in some cases reducing daily charges by 60% by implementing the lowest-price strategy and including more instance families like the m7 series.