Skip to content

Blog

How to setup docker with NVIDIA GPU support on Ubuntu 22

This took a bit of a search for me, so here it is in case it’s useful:

Terminal window
cd /tmp/
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-drivers-545 nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Then you should be able to run the following docker container with gpu enabled:

Terminal window
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

And get the following output:

Fri Dec 8 14:49:32 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 29C P0 25W / 70W | 5MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

Automatically cleanup outdated AMIs in all AWS regions

Here is a script you can use to automatically cleanup AMIs older than 60 days (configurable), while keeping the 2 most recent AMIs in each region. This helps to remove outdated images, as well as reducing storage costs for your AMIs.

Particularly useful in the case of runs-on.com, where we regularly rebuild base images whenever GitHub releases a new version of the image runner.

The bin/cleanup script (simply adjust the filters as needed):

#!/bin/bash
# Deregisters old AMIs and deletes associated snapshots, in all regions
set -e
set -o pipefail
APPLICATION="RunsOn"
REGIONS="$(aws ec2 describe-regions --query "Regions[].RegionName" --output text)"
# Number of days to keep AMIs
DAYS_TO_KEEP=${DAYS_TO_KEEP:=60}
# Define the age threshold in seconds (60 days)
AGE_THRESHOLD=$((DAYS_TO_KEEP*24*3600))
# Get the current timestamp in seconds since epoch
CURRENT_TIMESTAMP=$(date +%s)
for region in ${REGIONS[@]}; do
echo "---- Region: ${region} ---"
# List all your AMIs and extract relevant information using the AWS CLI
image_count=$(aws ec2 describe-images --owner self --filters "Name=tag:application, Values=${APPLICATION}" --query 'length(Images)' --region "$region" --output text)
echo " Total AMIs in this region: ${image_count}"
if [ "$image_count" -lt 2 ]; then
echo " Less than 2 AMIs found, skipping"
continue
fi
aws ec2 describe-images --owner self --region "${region}" --filters "Name=tag:application, Values=${APPLICATION}" --query 'Images[*].[Name,ImageId,CreationDate]' --output text | \
while read -r name image_id creation_date; do
# Parse the creation date into seconds since epoch
image_timestamp=$(date -d "$creation_date" +%s)
# Calculate the age of the AMI in seconds
age=$((CURRENT_TIMESTAMP - image_timestamp))
# Check if the AMI is older than the threshold
if [ $age -gt $AGE_THRESHOLD ]; then
echo " ! Deregistering AMI: ${image_id} (${name}) created on $creation_date"
snapshot_id=$(aws ec2 describe-images --image-ids "$image_id" --query "Images[].BlockDeviceMappings[].Ebs.SnapshotId" --region "${region}" --output text)
if [ "$DRY_RUN" = "true" ]; then
echo " DRY_RUN is set to true, skipping deregistering AMI ${image_id} and deleting snapshot ${snapshot_id}"
continue
fi
aws ec2 deregister-image --image-id "$image_id" --region "${region}"
echo " ! Deleting snapshot ${snapshot_id} for AMI ${image_id}"
aws ec2 delete-snapshot --snapshot-id "${snapshot_id}" --region "${region}"
fi
done
done

Example output:

---- Region: ap-southeast-2 ---
Total AMIs in this region: 0
---- Region: eu-central-1 ---
Total AMIs in this region: 5
Deregistering AMI: ami-0576e83d0a0f89fbe (runner-ubuntu2204-1699888130) created on 2023-11-13T16:53:48.000Z
Deleting snapshot snap-07db95a7f230d3f76 for AMI ami-0576e83d0a0f89fbe
Deregistering AMI: ami-004d4d18e6db2f812 (runner-ubuntu-22-1699873337) created on 2023-11-13T12:40:48.000Z
Deleting snapshot snap-0500b0e3fb95ab36a for AMI ami-004d4d18e6db2f812
Deregistering AMI: ami-0e6239eae649effcd (runner-ubuntu22-20231115.7-1700233930) created on 2023-11-17T17:01:58.000Z
Deleting snapshot snap-05e795e4c6fe9e66f for AMI ami-0e6239eae649effcd
Deregistering AMI: ami-0dd7f6b263a3ce28c (runner-ubuntu22-20231115-1700156105) created on 2023-11-16T19:24:38.000Z
Deleting snapshot snap-02c1aef800c429b76 for AMI ami-0dd7f6b263a3ce28c
---- Region: us-east-1 ---
Total AMIs in this region: 4
Deregistering AMI: ami-0b56f2d6af0d58ce0 (runner-ubuntu2204-1699888130) created on 2023-11-13T15:54:22.000Z
Deleting snapshot snap-0f2e8759bea8f3937 for AMI ami-0b56f2d6af0d58ce0
Deregistering AMI: ami-04266841492472a95 (runner-ubuntu22-20231115.7-1700233930) created on 2023-11-17T16:02:34.000Z
Deleting snapshot snap-0f0fcf9c6406c3ad9 for AMI ami-04266841492472a95
Deregistering AMI: ami-0738c7108915044fe (runner-ubuntu22-20231115-1700156105) created on 2023-11-16T18:21:40.000Z
Deleting snapshot snap-03f16588f59ed7cea for AMI ami-0738c7108915044fe
...

Example GitHub Action workflow file to schedule a cleanup every night:

name: Cleanup
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
on:
workflow_dispatch:
schedule:
- cron: '0 2 * * *'
jobs:
check:
timeout-minutes: 30
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- run: bin/cleanup

How to setup GitHub hosted runner with a simple cloud-init script

Edit: RunsOn is now available as a modern way to setup self-hosted GitHub Actions runners!

While waiting for the RunsOn ephemeral self-hosted runners service, here is a short bash script, that can also be used as cloud-init script for launching GitHub hosted runners non-interactively:

#!/bin/bash -ex
set -o pipefail
RUNNER_ORG=YOUR_ORG
RUNNER_TOKEN=YOUR_TOKEN
RUNNER_LABELS=self-hosted,x64,docker
# update this to the latest version
RUNNER_VERSION=2.311.0
# Install docker, then additional packages and stuff:
apt update -qq
apt install -y ruby awscli
curl https://nodejs.org/dist/v20.9.0/node-v20.9.0-linux-x64.tar.xz | tar -xJf - --strip=1 -C /usr/local/
curl https://get.docker.com | sh
cat > /etc/cron.daily/docker-prune <<EOF
docker image prune -a --filter="until=96h" --force
docker volume prune --force
EOF
chmod a+x /etc/cron.daily/docker-prune
# Create dedicated user
useradd -m -d /home/runner -s /bin/bash runner
usermod -G docker runner
# Download and install runner script
cd /home/runner
mkdir -p actions-runner
cd actions-runner
curl -o actions-runner-linux-x64-$RUNNER_VERSION.tar.gz -L https://github.com/actions/runner/releases/download/v$RUNNER_VERSION/actions-runner-linux-x64-$RUNNER_VERSION.tar.gz
tar xzf ./actions-runner-linux-x64-$RUNNER_VERSION.tar.gz
# Configure runner
su - runner -c "
/home/runner/actions-runner/config.sh --url https://github.com/$RUNNER_ORG --token $RUNNER_TOKEN --labels $RUNNER_LABELS --unattended
"
# Setup systemd scripts
cd /home/runner/actions-runner/
./svc.sh install runner
./svc.sh start
./svc.sh status

You should be good to go!

Note that those runners won’t be ephemeral, so usual caveats apply regarding security of those runners.