GPU runners with GitHub Actions

RunsOn gives you access to the full range of EC2 instances, with the ability to select specific GPU instances for your workflows, for much cheaper than the official GitHub Actions GPU runners. There is also no plan restriction.

Combined with the ability to bring your own or third-party images, you can make use of the official Deep Learning AMIs ↗ (DLAMI) provided by AWS to get your Machine Learning and AI workflows running in no time on GitHub Actions.

To get started with GPU runners, we recommend that you define a custom image configuration referencing the latest Deep Learning AMI, and then define a custom runner configuration referencing that image and the GPU instance type that you want to use.

Cost

GitHub provides GPU runners (gpu-t4-4-core) with 4 vCPUs, 28GB RAM and a Tesla T4 GPU with 16GB VRAM, for $0.07/min.

By comparison, even with on-demand pricing, the cost of running a GPU runner with the same Tesla T4 GPU card, 4vCPUs, and 16GB RAM (g4dn.xlarge) on AWS with RunsOn is $0.009/min, i.e. 85% cheaper. If using spot pricing, the cost is even lower, at $0.004/min, i.e. more than 10x cheaper.

Configuration file

images:
  dlami-x64:
    platform: "linux"
    arch: "x64"
    owner: "898082745236" # AWS
    name: "Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04)*"

runners:
  gpu-nvidia:
    family: ["g4dn.xlarge"]
    image: dlami-x64

Workflow job definition

name: Machine Learning Job

on:
  workflow_dispatch:
  push:
    paths:
      - .github/workflows/machine-learning-job.yml

jobs:
  default:
    runs-on: runs-on,runner=gpu-nvidia
    steps:
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - name: Display environment details
        run: npx envinfo
      - name: Display block storage
        run: sudo lsblk -l
      - name: Display NVIDIA SMI details
        run: |
          nvidia-smi
          nvidia-smi -L
          nvidia-smi -q -d Memory
      - name: Ensure Docker is available
        run: docker run hello-world
      - name: Execute your machine learning script
        run: echo "Running ML script..."

Note that runners will take a bit more time than usual to start due to the base image being very large (multiple versions of Cuda, etc.). If you know exactly what you require, you could create a more streamlined custom image with only what you need, using the Building custom AMI with packer guide.

Example output for the Display NVIDIA SMI details step:

Run nvidia-smi
Fri Jun 14 07:11:07 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
| N/A   31C    P8              12W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
GPU 0: Tesla T4 (UUID: GPU-f01f22cf-eff9-5d76-ee8a-755625b41fa2)

==============NVSMI LOG==============

Timestamp                                 : Fri Jun 14 07:11:07 2024
Driver Version                            : 535.183.01
CUDA Version                              : 12.2

Attached GPUs                             : 1
GPU 00000000:00:1E.0
    FB Memory Usage
        Total                             : 15360 MiB
        Reserved                          : 429 MiB
        Used                              : 2 MiB
        Free                              : 14928 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 2 MiB
        Free                              : 254 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB

Using locally-attached NVME disk(s)

GPU instances come with locally-attached NVME disk(s) of different sizes, which can be used to speed up your workflows. They come for free with the instance, so you don’t have to worry about the cost of the storage.

In our example with g4dn.xlarge, the NVME disk will be automatically mounted at /opt/dlami/nvme. You can use sudo lsblk -l to list the available block storage and their mount points.

Example output for the Display block storage step:

NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0                7:0    0  63.9M  1 loop /snap/core20/2318
loop1                7:1    0  55.7M  1 loop /snap/core18/2823
loop2                7:2    0    87M  1 loop /snap/lxd/28373
loop3                7:3    0  38.8M  1 loop /snap/snapd/21759
loop4                7:4    0  25.2M  1 loop /snap/amazon-ssm-agent/7993
vg.01-lv_ephemeral 252:0    0 116.4G  0 lvm  /opt/dlami/nvme
nvme0n1            259:0    0   120G  0 disk
nvme1n1            259:1    0 116.4G  0 disk
nvme0n1p1          259:2    0 119.9G  0 part /
nvme0n1p14         259:3    0     4M  0 part
nvme0n1p15         259:4    0   106M  0 part /boot/efi

Enjoy!