Skip to content

Troubleshooting

When something doesn’t work, the events page is usually your first stop — most failures surface there as a warning. Below are the issues operators hit most, each with its cause and fix.

permission denied connecting to the Docker daemon socket

Section titled “permission denied connecting to the Docker daemon socket”

Cause. The agent’s group ID doesn’t match /var/run/docker.sock on the host, so it can’t talk to the Docker daemon.

Fix. On the host, either add the agent’s user to the host’s docker group, or loosen the socket’s group permissions:

Terminal window
sudo chmod g+rw /var/run/docker.sock

Sign-in succeeds but bounces back to /login

Section titled “Sign-in succeeds but bounces back to /login”

Cause. BETTER_AUTH_URL doesn’t match the URL in your browser. Session cookies are scoped to the auth URL’s origin, so a mismatch means the cookie is never sent back.

Fix. Set BETTER_AUTH_URL to the exact user-facing https:// URL and restart the hub. See TLS and reverse proxy.

Cause. Usually a GitHub problem (the PAT is missing a scope or you’re rate-limited) or the scale set has backed off after repeated failures.

Fix. Check the events page for github warnings and open the scale set to see its backoff state. A backed-off scale set retries on its own; fix the underlying GitHub error (see PAT issues below) and it recovers.

Jobs fail with Cannot find module 'node:path' (or similar Node-stdlib errors)

Section titled “Jobs fail with Cannot find module 'node:path' (or similar Node-stdlib errors)”

Cause. The cached runner image is stale — an old actions/runner that predates working Node externals.

Fix. Open the scale set’s settings and set Image pull to Always or TtlHours. The next reconcile pass refreshes the layer and every spawn after runs on a current image. See Scale sets and Registry credentials.

Private-registry pull fails with unauthorized

Section titled “Private-registry pull fails with unauthorized”

Cause. The runner image lives in a private registry and the hub has no credential for it.

Fix. Add an image credential and attach it to the scale set’s image-pull credential select. See Registry credentials.

docker build / docker run fail inside a new scale set

Section titled “docker build / docker run fail inside a new scale set”

Cause. New scale sets don’t mount the Docker socket into runners by default, so jobs that build or run containers have no daemon to talk to.

Fix. Edit the scale set and tick Mount Docker socket inside runners. Existing scale sets keep their previous behavior and aren’t affected.

Container exits immediately with a lock message

Section titled “Container exits immediately with a lock message”

Cause. Another container is holding the data-volume lock. SQLite is single-writer.

Fix. Run exactly one container per data volume. If you started a second one, stop it.

Cause. The agent isn’t connected — it’s stopped, the machine is down, or its token was revoked.

Fix. Restart the agent on the host and confirm it can reach the hub. If you revoked its token, mint a fresh installer URL and re-enroll (re-enrollment is non-destructive of the data volume). See Adding hosts.

Cause. GitHub tokens expire or get edited, which surfaces as github warnings on the events page and stalls the affected org’s runners.

Fix. Re-enter the PAT for that org. See GitHub setup for the required scopes.