Best Practices
Optimizing for Less Spot Interruption
Spot instances can be interrupted when AWS needs the capacity back, which can disrupt your workflows. Here are strategies to minimize spot interruptions:
Diversify Instance Types
The most effective way to reduce spot interruptions is to diversify your instance type selection:
runners: my-custom-runner: # Include multiple families instead of restricting to one. family: ["m7", "c7", "r7"] # Use ram and cpu options to restrict the instance type among the wide family range. ram: [8, 16] # Specify a range instead of a single value cpu: [2, 8] # Allows more flexibility
Or using job label syntax:
runs-on: - runs-on=${{ github.run_id }} - family=m7+c7+r7 - ram=8+16 - cpu=2+8
Use Capacity-Optimized Allocation Strategy
Switch from the default price-focused strategy to capacity-optimized for better stability:
runners: my-custom-runner: spot: capacity-optimized # or spot: co
Or per job:
runs-on: - runs-on=${{ github.run_id }} - spot=capacity-optimized
To learn more about the allocation strategies, see Spot Allocation Strategies.
Configure the Spot Circuit Breaker
The spot circuit breaker automatically switches to on-demand instances when spot interruptions exceed a threshold over a period of time. The format is COUNT/WINDOW_MINUTES/RECOVERY_MINUTES
. You can adjust its sensitivity:
COUNT
: Number of interruptions before triggeringWINDOW_MINUTES
: Time window in minutes to count interruptionsRECOVERY_MINUTES
: Time before trying spot again
Increasing the COUNT
value makes the circuit breaker less sensitive, allowing more interruptions before switching to on-demand.
Consider Regional Availability
Spot availability varies by region. If you consistently face interruptions:
- Try deploying RunsOn in a different AWS region with better spot capacity
- Monitor spot interruption rates across regions
- Use multi-region deployments for critical workloads (one RunsOn stack per region, and use the
region
job label to select the region)
Balance Cost vs. Stability
For workloads where interruptions are particularly disruptive:
- Consider using a wider range of instance types
- Accept slightly higher costs for better stability with
capacity-optimized
allocation strategy - For critical jobs, use on-demand instances instead of spot (
spot=false
).
By implementing these strategies, you can significantly reduce spot interruptions while still maintaining cost efficiency.
Cost reduction
How do I maximize savings when using spot?
If your workflows do not require high-performance runners, use the spot=lowest-price
allocation strategy in your configuration. This prioritizes cost over performance by selecting the cheapest available spot instance that meets your requirements.
# In your runs-on.yml filerunners: my-custom-runner: spot: lowest-price
Or per job:
jobs: build: runs-on: - runs-on=${{ github.run_id }} - spot=lowest-price
What instance families should I include for better cost efficiency?
Include a wide range of instance families, especially the more common ones like m7
variants:
family: ["m7a", "m7i", "r7i", "c7i", "c7a", "r7a", "t3", "t3a"]
Or using the job label syntax:
runs-on: - runs-on=${{ github.run_id }} - family=m7a+m7i+r7i+c7i+c7a+r7a+t3+t3a
How should I specify RAM and CPU requirements?
Use ranges instead of listing every possible value:
# Instead of listing every valueram: [4, 512] # This specifies a range from 4GB to 512GBcpu: [2, 128] # This specifies a range from 2 to 128 cores
Or using the job label syntax:
runs-on: - runs-on=${{ github.run_id }} - family=m7a+m7i+m7i-flex+r7i+c7i+c7a+r7a+t3+t3a - ram=4+512 - cpu=2+128
Can I set default spot configurations for all jobs?
Yes, you can specify spot configuration in your runs-on.yml
file rather than in each individual job:
runners: my-custom-runner: spot: lowest-price family: ["m7a", "m7i", "r7i", "c7i", "c7a", "r7a", "t3", "t3a"] ram: [4, 512] cpu: [2, 128]
This can be further simplified using wildcard syntax:
runners: my-custom-runner: spot: lowest-price family: ["m7*", "c7*", "r7*", "t3*"] ram: [4, 512] cpu: [2, 128]
AWS will automatically select the cheapest instance at the time of launch that meets your requirements.
What results can I expect from these optimizations?
Users have reported significant cost reductions - in some cases reducing daily charges by 60% by implementing the lowest-price strategy and including more instance families like the m7 series.