Skip to content

Blog

Cloud-init tips and tricks for EC2 instances

Working extensively with RunsOn, I’ve spent considerable time with cloud-init, the industry-standard tool that automatically configures cloud instances on startup. Present on Ubuntu distributions on AWS, cloud-init fetches metadata from the underlying cloud provider and applies initial configurations. Here are some useful commands and techniques I’ve discovered for troubleshooting and inspecting EC2 instances.

When debugging instance startup issues, you often need to check what user-data was passed to your instance. Here are three ways to retrieve it:

The most reliable method is using the built-in cloud-init query command:

Terminal window
# View user data
sudo cloud-init query userdata
# View all instance data including user data
sudo cloud-init query --all

Cloud-init stores user-data locally after fetching it:

Terminal window
# User data is stored in
sudo cat /var/lib/cloud/instance/user-data.txt

3. Using the instance metadata service (IMDS)

Section titled “3. Using the instance metadata service (IMDS)”

You can also query the EC2 metadata service directly:

Terminal window
# IMDSv2 (recommended)
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/user-data
# IMDSv1 (legacy)
curl http://169.254.169.254/latest/user-data

Accessing all EC2 metadata without additional API calls

Section titled “Accessing all EC2 metadata without additional API calls”

Here’s a powerful tip: cloud-init automatically fetches all relevant data from the EC2 metadata API at startup and caches it locally. Instead of making multiple API calls, you can read everything from a single JSON file:

Terminal window
sudo cat /run/cloud-init/instance-data.json

This file contains a wealth of information about your instance. Let’s explore some useful queries:

Get comprehensive instance details:

Terminal window
sudo cat /run/cloud-init/instance-data.json | jq '.ds.dynamic["instance-identity"].document'

Output:

{
"accountId": "135269210855",
"architecture": "x86_64",
"availabilityZone": "us-east-1b",
"billingProducts": null,
"devpayProductCodes": null,
"imageId": "ami-0db4eca8382e7fc27",
"instanceId": "i-00a3d21a80694c44b",
"instanceType": "m7a.large",
"kernelId": null,
"marketplaceProductCodes": null,
"pendingTime": "2025-04-25T12:00:18Z",
"privateIp": "10.0.1.93",
"ramdiskId": null,
"region": "us-east-1",
"version": "2017-09-30"
}

Access all EC2 metadata fields:

Terminal window
sudo cat /run/cloud-init/instance-data.json | jq '.ds["meta-data"]'

This reveals extensive information including:

  • Network configuration (VPC, subnet, security groups)
  • IAM instance profile details
  • Block device mappings
  • Hostname and IP addresses
  • Maintenance events

Need just the public IP? Or the instance type? Use jq to extract specific fields:

Terminal window
# Get public IPv4 address
sudo jq -r '.ds["meta-data"]["public-ipv4"]' /run/cloud-init/instance-data.json
# Output: 3.93.38.69
# Get instance type
sudo jq -r '.ds["meta-data"]["instance-type"]' /run/cloud-init/instance-data.json
# Output: m7a.large
# Get availability zone
sudo jq -r '.ds["meta-data"]["placement"]["availability-zone"]' /run/cloud-init/instance-data.json
# Output: us-east-1b
# Get VPC ID
sudo jq -r '.ds["meta-data"]["network"]["interfaces"]["macs"][.ds["meta-data"]["mac"]]["vpc-id"]' /run/cloud-init/instance-data.json
# Output: vpc-03460bc2910d2b4e6

Understanding cloud-init and how to query instance metadata is crucial for:

  1. Troubleshooting: When instances fail to start correctly, checking user-data and metadata helps identify configuration issues
  2. Automation: Scripts can use this cached data instead of making API calls, reducing latency and API throttling
  3. Security: Accessing cached data avoids exposing credentials to the metadata service repeatedly
  4. Performance: Reading from local files is faster than HTTP requests to the metadata service

Cloud-init does more than just run your user-data script. It provides a comprehensive interface to instance metadata that’s invaluable for debugging and automation. Next time you’re troubleshooting an EC2 instance or writing automation scripts, remember these commands - they’ll save you time and API calls.

RunsOn is now handling more than 600k jobs per day

RunsOn stats showing 606,674 total runners

Another milestone reached: RunsOn is now processing over 600,000 jobs per day across all users! 🚀

Less than two months after hitting the 400k mark, we’ve grown by 50% to reach 606,674 daily jobs. This rapid growth demonstrates the increasing demand for reliable, cost-effective GitHub Actions runners. Organizations are discovering that self-hosted runners don’t have to be complex or expensive when using the right solution.

  1. Cost savings: Teams are saving up to 10x on their CI costs compared to GitHub-hosted runners
  2. Performance: Faster builds with dedicated resources and optimized configurations
  3. Flexibility: Choose your exact instance types and configurations
  4. Simplicity: Deploy in minutes, not days, with our streamlined setup

As we continue to scale, we’re focused on:

  • Improving performance monitoring and insights
  • Faster boot times for runners

Thank you to all our users who trust RunsOn for their critical CI/CD workflows. Here’s to the next milestone! 🎯

Want to join the thousands of developers already using RunsOn? Get started today and see why teams are switching to RunsOn for their GitHub Actions needs.

Why smart developers choose ephemeral runners (and you should too)

Here’s a question that separates senior engineers from the rest: Should your GitHub Actions runners live forever or die after each job?

If you answered “live forever,” you’re probably still debugging why your CI randomly fails on Tuesdays.

Long-lived runners feel intuitive. Spin up a VM, register it with GitHub, let it churn through jobs. No startup overhead, no provisioning delays. It’s the CI equivalent of keeping your laptop running 24/7 because “booting takes too long.”

But here’s what actually happens:

  • Week 1: “Our runners are blazing fast!”
  • Week 3: “Why do tests pass locally but fail in CI?”
  • Week 6: “Let’s restart all runners and see if that fixes it.”
  • Week 12: “We need a dedicated person to babysit our CI infrastructure.”

Sound familiar?

The four horsemen of long-lived runner apocalypse

Section titled “The four horsemen of long-lived runner apocalypse”

Your runner accumulates garbage like a browser with 847 open tabs. Docker layers, npm caches, environment variables, temp files—each job leaves traces. Eventually, Job #847 fails because Job #23 left some Node modules lying around.

Memory fragments. Disk fills up. CPU gets pinned by zombie processes. What started as a c5.large performing like a c5.large slowly becomes a c5.large performing like a t2.micro having an existential crisis.

That environment variable someone set for debugging last Tuesday? Still exported. That SSH key generated for a one-off deployment? Still in ~/.ssh. Your “clean” runner is basically a museum of security vulnerabilities.

Bugs that only appear after the runner has processed exactly 47 jobs involving TypeScript compilation. Good luck reproducing that locally.

Ephemeral runners are the Marie Kondo approach to CI: if it doesn’t spark joy (i.e., it’s not your current job), thank it and throw it away.

Every job gets:

  • A pristine environment identical to your base image
  • Zero state from previous executions
  • Consistent resource allocation
  • Perfect isolation from other workloads

The math is simple:

  • Long-lived: Pay for 24/7 × N runners × mysterious overhead
  • Ephemeral: Pay for actual job runtime × spot pricing discount

The classic objection: “Ephemeral runners are slow because of boot time!”

This is 2025 thinking with 2015 assumptions. Modern ephemeral runners boot in under 30 seconds. Your Docker build probably takes longer to download base images.

Plus, what’s worse: 30 seconds of predictable startup time, or 3 hours debugging why your integration tests only fail on runner-07 when Mars is in retrograde?

We’ve processed millions of jobs with this approach. Here’s how we make it work:

  • 30-second boot times with optimized AMIs and provisioned network throughput
  • Spot instance compatibility for 75% cost savings
  • One runner per job ensures perfect isolation
  • Zero operational overhead because there’s no state to manage

When your job finishes, the runner gets terminated. No cleanup scripts, no monitoring dashboards, no 3 AM alerts about runner-14 being “unhealthy.”

The architecture your future self will thank you for

Section titled “The architecture your future self will thank you for”

Long-lived runners are like global variables in code—they seem convenient until they’re not, and by then you’re too deep to refactor easily.

Ephemeral runners are like pure functions: predictable inputs, predictable outputs, no side effects. The kind of architecture that lets you sleep soundly knowing your CI isn’t a ticking time bomb.

Your security team gets perfect isolation. Your finance team gets usage-based costs. Your developers get consistent, reproducible builds. Everyone wins except the person who has to maintain the old system (which is no longer you).

If you’re still running long-lived CI infrastructure in 2025, you’re optimizing for the wrong metrics. You’re choosing theoretical performance over actual reliability, imaginary cost savings over real operational simplicity.

Smart money is on ephemeral. Smart developers choose tools that scale without accumulating technical debt.

Make the smart choice. Try RunsOn today.

The true cost of self-hosted GitHub Actions - Separating fact from fiction

In recent discussions about GitHub Actions runners, there’s been some debate around the true cost and complexity of self-hosted solutions. With blog posts like “Self-hosted GitHub Actions runners aren’t free” and various companies raising millions to build high-performance CI clouds, it’s important to separate fact from fiction.

It’s true that traditional self-hosted GitHub Actions runner approaches come with challenges:

  • Operational overhead: Maintaining AMIs, monitoring infrastructure, and debugging API issues
  • Hidden costs: Infrastructure expenses, egress charges, and wasted capacity
  • Human costs: Engineering time spent on maintenance rather than product development

However, these challenges aren’t inherent to self-hosted runners themselves. They’re symptoms of inadequate tooling for deploying and managing them.

At RunsOn, we’ve specifically designed our solution to deliver the benefits of self-hosted GitHub Actions runners without the traditional downsides:

While some providers claim to eliminate maintenance, they’re actually just moving your workloads to their infrastructure—creating new dependencies and security concerns. RunsOn takes a fundamentally different approach:

  • Battle-tested CloudFormation stack: Deploy in 10 minutes with a simple template URL.
  • Zero Kubernetes complexity: Unlike Actions Runner Controller (ARC), no complex cluster management.
  • Scales to zero: No jobs in queue? No cost. When a job comes up, RunsOn spins up a new runner and starts the job in less than 30s.
  • Automatic updates: Easy, non-disruptive upgrade process.
  • No manual AMI maintenance: Regularly updated runner images.

When third-party providers advertise “2x cheaper” services, they’re comparing themselves to GitHub-hosted runners—not to true self-hosted solutions. With RunsOn:

  • Up to 90% cost reduction compared to GitHub-hosted runners.
  • AWS Spot instances provide maximum savings (up to 75% cheaper than on-demand).
  • Use your existing AWS credits and committed spend.
  • No middleman markup on compute resources.
  • Transparent licensing model, with a low fee irrespective of the number of runners or job minutes you use.

Many third-party solutions gloss over a critical fact: your code and secrets are processing on their infrastructure. RunsOn:

  • 100% self-hosted in your AWS account—no code or secrets leave your infrastructure.
  • Ephemeral VM isolation with one clean runner per job.
  • Full audit capabilities through your AWS account.
  • No attack vectors from persistent runners.

High-performance CI doesn’t require VC-funded cloud platforms:

  • 30% faster builds than GitHub-hosted runners.
  • Flexible instance selection with x64, ARM64, GPUs, and Windows support.
  • Unlimited concurrency (only limited by your AWS quotas).
  • Supercharged caching with VPC-local S3 cache backend (5x faster transfers).

The often-cited “human cost” of self-hosted runners assumes significant ongoing maintenance. With RunsOn:

  • 10-minute setup with close to zero AWS knowledge required.
  • No ongoing maintenance burden for your DevOps team. Upgrades are one click away, and can be performed at your own pace.
  • No infrastructure to babysit or weekend emergency calls.
  • No complex debugging of runner API issues.

Let’s address some specific claims from recent competitor blog posts:

Claim: “Maintaining AMIs is time-consuming and error-prone”

Section titled “Claim: “Maintaining AMIs is time-consuming and error-prone””

Reality: RunsOn handles all AMI maintenance for you, with regularly updated images that are 100% compatible with GitHub’s official runners. If you want full control, we also provide templates for building custom images.

Claim: “Self-hosting means babysitting infrastructure”

Section titled “Claim: “Self-hosting means babysitting infrastructure””

Reality: RunsOn uses fully managed AWS services and ephemeral runners that are automatically recycled after each job. There’s no infrastructure to babysit.

Claim: “You’ll need to become an expert in GitHub Actions”

Section titled “Claim: “You’ll need to become an expert in GitHub Actions””

Reality: With RunsOn, you only need to change one line in your workflow files—replacing runs-on: ubuntu-latest with your custom labels. No GitHub Actions expertise required.

Claim: “High-performance CI requires third-party infrastructure”

Section titled “Claim: “High-performance CI requires third-party infrastructure””

Reality: RunsOn provides high-performance CI within your own AWS account, with benchmarks showing 30% faster builds for x64 workloads than GitHub-hosted runners and full compatibility with the latest instance types and architectures.

For arm64 workfloads, AWS is currently the leader in CPU performance.

Self-hosted GitHub Actions runners can be complex and costly, if you’re using the wrong approach. But with RunsOn, you get all the benefits of self-hosting (cost savings, performance, security) without the traditional drawbacks.

Before making assumptions about the “true cost” of self-hosted runners, evaluate solutions like RunsOn that have specifically solved these challenges. Your developers, security team, and finance department will all thank you.

Get started with RunsOn today!