bug: unique random name for networks not working when network is set in config #912

Closed
opened 2025-08-24 11:22:39 +00:00 by NickP · 25 comments

Can you reproduce the bug on the Forgejo test instance?

No

Description

When the network option in the container section of config.yml, the runner is unable to create/connect a random network for a service (#850).

If I understand correctly, with the network option for containers it can be specified, which network should be by for the runner and the containers it creates. If a new network is created for a service it is, from my point of view, correct that the workflow can't connect cause it would break the network settings from the config.

Forgejo Version

12.0.1

Runner Version

9.1.1

How are you running Forgejo?

I'm running forgejo with docker-compose and the official docker-image.

How are you running the Runner?

I'm using the official docker-image

Logs

runner(version:v9.1.1) received task 324490 of job simple-no-container, be triggered by event: push
workflow prepared
🚀 Start image=alpine:3.18
🐳 docker pull image=code.forgejo.org/oci/postgres:15 platform= username= forcePull=false
🐳 docker pull image=alpine:3.18 platform= username= forcePull=false
Cleaning up services for job simple-no-container
Cleaning up network for job simple-no-container, and network name is: WORKFLOW-71d562108ef73a30612c686458001689
🐳 docker pull image=code.forgejo.org/oci/postgres:15 platform= username= forcePull=false
🐳 docker create image=code.forgejo.org/oci/postgres:15 platform= entrypoint=[] cmd=[] network="WORKFLOW-71d562108ef73a30612c686458001689"
🐳 docker run image=code.forgejo.org/oci/postgres:15 platform= entrypoint=[] cmd=[] network="WORKFLOW-71d562108ef73a30612c686458001689"
failed to start container: Error response from daemon: network WORKFLOW-71d562108ef73a30612c686458001689 not found

Workflow file

on: [push]

jobs:
  simple-no-container:
    runs-on: docker
    services:
      pgsql:
        image: code.forgejo.org/oci/postgres:15
        env:
          POSTGRES_DB: test
          POSTGRES_PASSWORD: postgres
    steps:
    - run: |
       echo "Hello"
### Can you reproduce the bug on the Forgejo test instance? No ### Description When the network option in the container section of config.yml, the runner is unable to create/connect a random network for a service (#850). If I understand correctly, with the network option for containers it can be specified, which network should be by for the runner and the containers it creates. If a new network is created for a service it is, from my point of view, correct that the workflow can't connect cause it would break the network settings from the config. ### Forgejo Version 12.0.1 ### Runner Version 9.1.1 ### How are you running Forgejo? I'm running forgejo with docker-compose and the official docker-image. ### How are you running the Runner? I'm using the official docker-image ### Logs runner(version:v9.1.1) received task 324490 of job simple-no-container, be triggered by event: push workflow prepared 🚀 Start image=alpine:3.18 🐳 docker pull image=code.forgejo.org/oci/postgres:15 platform= username= forcePull=false 🐳 docker pull image=alpine:3.18 platform= username= forcePull=false Cleaning up services for job simple-no-container Cleaning up network for job simple-no-container, and network name is: WORKFLOW-71d562108ef73a30612c686458001689 🐳 docker pull image=code.forgejo.org/oci/postgres:15 platform= username= forcePull=false 🐳 docker create image=code.forgejo.org/oci/postgres:15 platform= entrypoint=[] cmd=[] network="WORKFLOW-71d562108ef73a30612c686458001689" 🐳 docker run image=code.forgejo.org/oci/postgres:15 platform= entrypoint=[] cmd=[] network="WORKFLOW-71d562108ef73a30612c686458001689" failed to start container: Error response from daemon: network WORKFLOW-71d562108ef73a30612c686458001689 not found ### Workflow file ```yaml on: [push] jobs: simple-no-container: runs-on: docker services: pgsql: image: code.forgejo.org/oci/postgres:15 env: POSTGRES_DB: test POSTGRES_PASSWORD: postgres steps: - run: | echo "Hello" ```
Contributor

Could you please share your runner configuration file? Redacted for secrets.

Could you please share your runner configuration file? Redacted for secrets.
Contributor

@NickP gentle ping?

@NickP gentle ping?

I'm experiencing a similar issue at the very least, though the error is slightly different...

Running Forgejo v12.0.1 and Runner v9.1.1, with both run via docker compose (albeit separately).

  🐳  docker pull image=mariadb:lts platform= username= forcePull=false
  🐳  docker create image=mariadb:lts platform= entrypoint=[] cmd=[] network="WORKFLOW-11411f1c212733d391f9755f64d95071"
  🐳  docker run image=mariadb:lts platform= entrypoint=[] cmd=[] network="WORKFLOW-11411f1c212733d391f9755f64d95071"
failed to start container: Error response from daemon: failed to set up container networking: network WORKFLOW-11411f1c212733d391f9755f64d95071 not found

Config file:

log:
  level: debug
  job_level: info

runner:
  file: .runner
  capacity: 2
  envs:
    DOCKER_HOST: tcp://docker:2376
    DOCKER_TLS_VERIFY: 1
    DOCKER_CERT_PATH: /certs/client
    A_TEST_ENV_NAME_1: a_test_env_value_1
    A_TEST_ENV_NAME_2: a_test_env_value_2
  env_file: .env
  shutdown_timeout: 3h
  insecure: false
  fetch_timeout: 5s
  fetch_interval: 2s
  report_interval: 1s
  labels: []

cache:
  dir: ""
  host: ""
  port: 0
  proxy_port: 0
  external_server: ""
  secret: ""
  actions_cache_url_override: ""

container:
  network: host
  enable_ipv6: false
  privileged: true
  options: -v /certs/client:/certs/client
  workdir_parent:
  valid_volumes:
    - /certs/client
  docker_host: "-"
  force_pull: false
  force_rebuild: false

host:
  workdir_parent:
I'm experiencing a similar issue at the very least, though the error is slightly different... Running Forgejo v12.0.1 and Runner v9.1.1, with both run via docker compose (albeit separately). ``` 🐳 docker pull image=mariadb:lts platform= username= forcePull=false 🐳 docker create image=mariadb:lts platform= entrypoint=[] cmd=[] network="WORKFLOW-11411f1c212733d391f9755f64d95071" 🐳 docker run image=mariadb:lts platform= entrypoint=[] cmd=[] network="WORKFLOW-11411f1c212733d391f9755f64d95071" failed to start container: Error response from daemon: failed to set up container networking: network WORKFLOW-11411f1c212733d391f9755f64d95071 not found ``` Config file: ```yaml log: level: debug job_level: info runner: file: .runner capacity: 2 envs: DOCKER_HOST: tcp://docker:2376 DOCKER_TLS_VERIFY: 1 DOCKER_CERT_PATH: /certs/client A_TEST_ENV_NAME_1: a_test_env_value_1 A_TEST_ENV_NAME_2: a_test_env_value_2 env_file: .env shutdown_timeout: 3h insecure: false fetch_timeout: 5s fetch_interval: 2s report_interval: 1s labels: [] cache: dir: "" host: "" port: 0 proxy_port: 0 external_server: "" secret: "" actions_cache_url_override: "" container: network: host enable_ipv6: false privileged: true options: -v /certs/client:/certs/client workdir_parent: valid_volumes: - /certs/client docker_host: "-" force_pull: false force_rebuild: false host: workdir_parent: ```
Contributor

services require [container].network to be empty so that a new network can be temporarily created to connect the container running the workflow with the services.

It would be useful to add a note, it is not intuitive. I sent a pull request https://codeberg.org/forgejo/docs/pulls/1411/files and it would be great if you can review and approve so it can be merged.

Does that help?

services require `[container].network` to be empty so that a new network can be temporarily created to connect the container running the workflow with the services. It would be useful to add a note, it is not intuitive. I sent a pull request https://codeberg.org/forgejo/docs/pulls/1411/files and it would be great if you can review and approve so it can be merged. Does that help?

I discovered as much through my own experimentation, but agree that it's not obvious, especially as the existing examples for operating the runner via docker compose (or otherwise) set [container].network. I think adding a note is a good idea regardless, so I've approved the changes.

I discovered as much through my own experimentation, but agree that it's not obvious, especially as the existing examples for operating the runner via docker compose (or otherwise) _set_ `[container].network`. I think adding a note is a good idea regardless, so I've approved the changes.
Contributor

@NickP is it the same problem you ran into by any chance?

@NickP is it the same problem you ran into by any chance?
Author

@earl-warren wrote in #912 (comment):

Could you please share your runner configuration file? Redacted for secrets.

Sorry for my late reply.

I create the configuration using the command option for a service in my docker-compose-file when starting the container.
It is created as follows:

forgejo-runner generate-config > config.yml ;
sed -i -e "s|network: .*|network: forgejo-runner-net|" config.yml ;
sed -i -e "s|labels: \[\]|labels: \[\"docker:docker://alpine:3.18\", \"basic:docker://alpine:3.18\"\]|" config.yml ;
sed -i -e "s|insecure: .*|insecure: true|" config.yml ;
sed -i -e "s|capacity: .*|capacity: 3|" config.yml ;
@earl-warren wrote in https://code.forgejo.org/forgejo/runner/issues/912#issuecomment-56783: > Could you please share your runner configuration file? Redacted for secrets. Sorry for my late reply. I create the configuration using the command option for a service in my docker-compose-file when starting the container. It is created as follows: ``` forgejo-runner generate-config > config.yml ; sed -i -e "s|network: .*|network: forgejo-runner-net|" config.yml ; sed -i -e "s|labels: \[\]|labels: \[\"docker:docker://alpine:3.18\", \"basic:docker://alpine:3.18\"\]|" config.yml ; sed -i -e "s|insecure: .*|insecure: true|" config.yml ; sed -i -e "s|capacity: .*|capacity: 3|" config.yml ; ```
Author

@earl-warren wrote in #912 (comment):

@NickP is it the same problem you ran into by any chance?

To me, it looks like it's exactly the same issue that I have.

@earl-warren wrote in https://code.forgejo.org/forgejo/runner/issues/912#issuecomment-57286: > @NickP is it the same problem you ran into by any chance? To me, it looks like it's exactly the same issue that I have.
Contributor

And do you confirm it solves the issue if you do not force a fixed network name with the following?

sed -i -e "s|network: .*|network: forgejo-runner-net|" config.yml ;
And do you confirm it solves the issue if you do not force a fixed network name with the following? ```sh sed -i -e "s|network: .*|network: forgejo-runner-net|" config.yml ; ```

I use the kubernetes example (which is more or less the same as the docker compose version) and get the same problem:

forgejo-runner-5d785bfd6-ptnmz(version:v10.0.1) received task 5656 of job maven-build, be triggered by event: push
workflow prepared
🚀  Start image=catthehacker/ubuntu:act-latest
The network alias is build_my_app (sanitized version of Build My App)
  🐳  docker pull image=registry.myhost.com/postgresql:17.5 platform= username= forcePull=false
  🐳  docker pull image=catthehacker/ubuntu:act-latest platform= username= forcePull=false
Cleaning up services for job Build My App
Cleaning up network for job Build My App, and network name is: WORKFLOW-5eea5ead7d5651334f613cd142c03981
  🐳  docker pull image=registry.myhost.com/postgresql:17.5 platform= username= forcePull=false
  🐳  docker create image=registry.myhost.com/postgresql:17.5 platform= entrypoint=[] cmd=[] network="WORKFLOW-5eea5ead7d5651334f613cd142c03981"
  🐳  docker run image=registry.myhost.com/postgresql:17.5 platform= entrypoint=[] cmd=[] network="WORKFLOW-5eea5ead7d5651334f613cd142c03981"
failed to start container: Error response from daemon: failed to set up container networking: network WORKFLOW-5eea5ead7d5651334f613cd142c03981 not found

failed to start container: Error response from daemon: failed to set up container networking: network WORKFLOW-f838f1238182705858f1b42e79db1bd3 not found

this error happens since version 9.1.0 of the runner, before that every version (i update quite often) since 4.0.1 has worked without an issue. my config looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: forgejo-runner
  labels:
    app.kubernetes.io/name: forgejo-runner
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: forgejo-runner
  template:
    metadata:
      labels:
        app.kubernetes.io/name: forgejo-runner
    spec:
      automountServiceAccountToken: false
      restartPolicy: Always
      volumes:
        - name: docker-certs
          emptyDir: {}
        - name: runner-data
          emptyDir: { }
        - name: tmp
          emptyDir: {}
      initContainers:
        - name: runner-register
          image: code.forgejo.org/forgejo/runner:10.0.1
          command:
            - /bin/bash
            - -ec
            - |
              while : ; do
                forgejo-runner register --no-interactive --token $(RUNNER_SECRET) --name $(RUNNER_NAME) --instance $(FORGEJO_INSTANCE_URL) --labels $(RUNNER_LABELS) && break ;
                sleep 1 ;
              done ;
              forgejo-runner generate-config > config.yml ;
              sed -i -e "s|network: .*|network: host|" config.yml ;
              sed -i -e "s|^  envs:$$|  envs:\n    DOCKER_HOST: tcp://127.0.0.1:2376\n    DOCKER_TLS_VERIFY: 1\n    DOCKER_CERT_PATH: /certs/client|" config.yml ;
              sed -i -e "s|^  options:|  options: -v /certs/client:/certs/client|" config.yml ;
              sed -i -e "s|  valid_volumes: \[\]$$|  valid_volumes:\n    - /certs/client|" config.yml ;
          env:
            - name: RUNNER_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: RUNNER_SECRET
              valueFrom:
                secretKeyRef:
                  name: forgejo
                  key: runner-token
            - name: RUNNER_LABELS
              value: "docker:docker://node:20-bullseye,ubuntu-latest:docker://catthehacker/ubuntu:act-latest,ubuntu-22.04:docker://catthehacker/ubuntu:act-22.04"
            - name: FORGEJO_INSTANCE_URL
              value: https://git.myhost.com
          resources:
            limits:
              cpu: "0.50"
              memory: "64Mi"
          volumeMounts:
            - name: runner-data
              mountPath: /data
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            privileged: false
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            seccompProfile:
              type: RuntimeDefault

      containers:
        - name: runner
          image: code.forgejo.org/forgejo/runner:10.0.1
          command: ["sh", "-c", "while ! nc -z localhost 2376 </dev/null; do echo 'waiting for docker daemon...'; sleep 5; done; forgejo-runner --config config.yml daemon"]
          env:
            - name: DOCKER_HOST
              value: tcp://127.0.0.1:2376
            - name: DOCKER_CERT_PATH
              value: /certs/client
            - name: DOCKER_TLS_VERIFY
              value: "1"
          resources:
            limits:
              cpu: '4'
              ephemeral-storage: 3Gi
              memory: 4Gi
            requests:
              cpu: 100m
              ephemeral-storage: '0'
              memory: 64Mi
          volumeMounts:
            - name: docker-certs
              mountPath: /certs
            - name: runner-data
              mountPath: /data
            - name: tmp
              mountPath: /tmp
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            privileged: false
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            seccompProfile:
              type: RuntimeDefault

        - name: daemon
          image: docker:28.3.3-dind
          env:
            - name: DOCKER_HOST
              value: tcp://127.0.0.1:2376
            - name: DOCKER_TLS_VERIFY
              value: "1"
            - name: DOCKER_TLS_CERTDIR
              value: /certs
          resources:
            limits:
              cpu: '4'
              ephemeral-storage: 3Gi
              memory: 4Gi
            requests:
              cpu: 100m
              ephemeral-storage: '0'
              memory: 64Mi
          securityContext:
            privileged: true
          volumeMounts:
            - name: docker-certs
              mountPath: /certs

I don't know if it had something to do with: #850

I use the kubernetes example (which is more or less the same as the docker compose version) and get the same problem: ``` forgejo-runner-5d785bfd6-ptnmz(version:v10.0.1) received task 5656 of job maven-build, be triggered by event: push workflow prepared 🚀 Start image=catthehacker/ubuntu:act-latest The network alias is build_my_app (sanitized version of Build My App) 🐳 docker pull image=registry.myhost.com/postgresql:17.5 platform= username= forcePull=false 🐳 docker pull image=catthehacker/ubuntu:act-latest platform= username= forcePull=false Cleaning up services for job Build My App Cleaning up network for job Build My App, and network name is: WORKFLOW-5eea5ead7d5651334f613cd142c03981 🐳 docker pull image=registry.myhost.com/postgresql:17.5 platform= username= forcePull=false 🐳 docker create image=registry.myhost.com/postgresql:17.5 platform= entrypoint=[] cmd=[] network="WORKFLOW-5eea5ead7d5651334f613cd142c03981" 🐳 docker run image=registry.myhost.com/postgresql:17.5 platform= entrypoint=[] cmd=[] network="WORKFLOW-5eea5ead7d5651334f613cd142c03981" failed to start container: Error response from daemon: failed to set up container networking: network WORKFLOW-5eea5ead7d5651334f613cd142c03981 not found ``` `failed to start container: Error response from daemon: failed to set up container networking: network WORKFLOW-f838f1238182705858f1b42e79db1bd3 not found` this error happens since version 9.1.0 of the runner, before that every version (i update quite often) since 4.0.1 has worked without an issue. my config looks like this: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: forgejo-runner labels: app.kubernetes.io/name: forgejo-runner spec: replicas: 2 selector: matchLabels: app.kubernetes.io/name: forgejo-runner template: metadata: labels: app.kubernetes.io/name: forgejo-runner spec: automountServiceAccountToken: false restartPolicy: Always volumes: - name: docker-certs emptyDir: {} - name: runner-data emptyDir: { } - name: tmp emptyDir: {} initContainers: - name: runner-register image: code.forgejo.org/forgejo/runner:10.0.1 command: - /bin/bash - -ec - | while : ; do forgejo-runner register --no-interactive --token $(RUNNER_SECRET) --name $(RUNNER_NAME) --instance $(FORGEJO_INSTANCE_URL) --labels $(RUNNER_LABELS) && break ; sleep 1 ; done ; forgejo-runner generate-config > config.yml ; sed -i -e "s|network: .*|network: host|" config.yml ; sed -i -e "s|^ envs:$$| envs:\n DOCKER_HOST: tcp://127.0.0.1:2376\n DOCKER_TLS_VERIFY: 1\n DOCKER_CERT_PATH: /certs/client|" config.yml ; sed -i -e "s|^ options:| options: -v /certs/client:/certs/client|" config.yml ; sed -i -e "s| valid_volumes: \[\]$$| valid_volumes:\n - /certs/client|" config.yml ; env: - name: RUNNER_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: RUNNER_SECRET valueFrom: secretKeyRef: name: forgejo key: runner-token - name: RUNNER_LABELS value: "docker:docker://node:20-bullseye,ubuntu-latest:docker://catthehacker/ubuntu:act-latest,ubuntu-22.04:docker://catthehacker/ubuntu:act-22.04" - name: FORGEJO_INSTANCE_URL value: https://git.myhost.com resources: limits: cpu: "0.50" memory: "64Mi" volumeMounts: - name: runner-data mountPath: /data securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: runner image: code.forgejo.org/forgejo/runner:10.0.1 command: ["sh", "-c", "while ! nc -z localhost 2376 </dev/null; do echo 'waiting for docker daemon...'; sleep 5; done; forgejo-runner --config config.yml daemon"] env: - name: DOCKER_HOST value: tcp://127.0.0.1:2376 - name: DOCKER_CERT_PATH value: /certs/client - name: DOCKER_TLS_VERIFY value: "1" resources: limits: cpu: '4' ephemeral-storage: 3Gi memory: 4Gi requests: cpu: 100m ephemeral-storage: '0' memory: 64Mi volumeMounts: - name: docker-certs mountPath: /certs - name: runner-data mountPath: /data - name: tmp mountPath: /tmp securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsNonRoot: true seccompProfile: type: RuntimeDefault - name: daemon image: docker:28.3.3-dind env: - name: DOCKER_HOST value: tcp://127.0.0.1:2376 - name: DOCKER_TLS_VERIFY value: "1" - name: DOCKER_TLS_CERTDIR value: /certs resources: limits: cpu: '4' ephemeral-storage: 3Gi memory: 4Gi requests: cpu: 100m ephemeral-storage: '0' memory: 64Mi securityContext: privileged: true volumeMounts: - name: docker-certs mountPath: /certs ``` I don't know if it had something to do with: #850
Owner

drop sed -i -e "s|network: .*|network: host|" config.yml ; and it should work

drop `sed -i -e "s|network: .*|network: host|" config.yml ;` and it should work

@viceice Yes that helps with the problem but also generates a new one for me. The advantage of using host network in the workflow step was, that I can also use the dind container (daemon) to build docker images using docker/build-push-action@v6 action in a workflow. Since the steps are now executed in a ephemeral network, that doesn't work anymore.

@viceice Yes that helps with the problem but also generates a new one for me. The advantage of using host network in the workflow step was, that I can also use the dind container (daemon) to build docker images using docker/build-push-action@v6 action in a workflow. Since the steps are now executed in a ephemeral network, that doesn't work anymore.
Owner

@metawave wrote in #912 (comment):

@viceice Yes that helps with the problem but also generates a new one for me. The advantage of using host network in the workflow step was, that I can also use the dind container (daemon) to build docker images using docker/build-push-action@v6 action in a workflow. Since the steps are now executed in a ephemeral network, that doesn't work anymore.

it's the DOCKER_HOST variable, set the hostname to RUNNER_NAME and it should work

@metawave wrote in https://code.forgejo.org/forgejo/runner/issues/912#issuecomment-57675: > @viceice Yes that helps with the problem but also generates a new one for me. The advantage of using host network in the workflow step was, that I can also use the dind container (daemon) to build docker images using docker/build-push-action@v6 action in a workflow. Since the steps are now executed in a ephemeral network, that doesn't work anymore. it's the `DOCKER_HOST` variable, set the hostname to `RUNNER_NAME` and it should work

Found a solution using Statefulset and Service and then using the FQDN of the runner pod as DOCKER_HOST, a way more complicated setup than before. If someone has run into the same problem as I have, please contact me, maybe I can help now.

Found a solution using Statefulset and Service and then using the FQDN of the runner pod as DOCKER_HOST, a way more complicated setup than before. If someone has run into the same problem as I have, please contact me, maybe I can help now.
Contributor

@metawave could you please describe your solution here? 🙏 It will be very helpful to others: this is not easy to figure out 😓

@metawave could you please describe your solution here? 🙏 It will be very helpful to others: this is not easy to figure out 😓

Problem Statement

When running Forgejo runners in Kubernetes with Docker-in-Docker (DinD), workflow step containers need to access the Docker daemon from their parent pod. However, standard Kubernetes deployments don't provide stable, predictable DNS names for individual pods, making it impossible for step containers to reliably connect to the Docker daemon.

The Challenge

The networking challenge stems from how DNS resolution works in this setup:

  • Kubernetes containers inherit the cluster's DNS service configuration
  • Docker-in-Docker daemons pass their DNS configuration to child containers
  • Workflow steps (running as containers) need to resolve the Docker daemon's address
  • Standard deployments don't create per-pod DNS records

Solution: StatefulSets with Headless Services

The solution involves converting the runner deployment to a StatefulSet and creating a headless service. This combination provides each pod with a stable, unique DNS name following the pattern:

<pod-name>.<service-name>.<namespace>.svc.cluster.local

Implementation

Step 1: Create a Headless Service

apiVersion: v1
kind: Service
metadata:
  name: runners
  labels:
    app.kubernetes.io/name: forgejo-runner
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app.kubernetes.io/name: forgejo-runner
  ports:
    - name: docker
      port: 2376
      targetPort: 2376
      protocol: TCP

Step 2: Convert to StatefulSet

Change your Deployment to a StatefulSet and add the serviceName:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: forgejo-runner
spec:
  replicas: 2
  serviceName: runners  # Links to the headless service
  selector:
    matchLabels:
      app.kubernetes.io/name: forgejo-runner

Step 3: Update Runner Registration

Modify the init container to inject the DOCKER_HOST with the pod's FQDN:

initContainers:
  - name: runner-register
    command:
      - /bin/bash
      - -ec
      - |
        # ... registration logic ...
        forgejo-runner generate-config > config.yml
        sed -i -e "s|^  envs:$|  envs:\n    DOCKER_HOST: tcp://${RUNNER_NAME}.runners.forgejo.svc.cluster.local:2376\n    DOCKER_TLS_VERIFY: 1\n    DOCKER_CERT_PATH: /certs/client|" config.yml
        sed -i -e "s|^  options:|  options: -v /certs/client:/certs/client|" config.yml
        sed -i -e "s|  valid_volumes: \[\]$$|  valid_volumes:\n    - /certs/client|" config.yml

Step 4: Configure TLS SANs

Add the pod's FQDN to the Docker daemon's TLS certificate:

- name: daemon
  image: docker:28.3.3-dind
  env:
    - name: RUNNER_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: DOCKER_TLS_SAN
      value: DNS:$(RUNNER_NAME).runners.forgejo.svc.cluster.local,DNS:localhost,IP:127.0.0.1

Key Points

  1. StatefulSet with serviceName: Creates predictable DNS names like forgejo-runner-0.runners.forgejo.svc.cluster.local

  2. DOCKER_HOST injection: The sed command injects the pod's FQDN into the runner config, ensuring workflow containers know where to find the Docker daemon

  3. TLS certificate SANs: The DOCKER_TLS_SAN environment variable ensures the certificate is valid for both localhost and the Kubernetes DNS name

Namespace Considerations

If deploying outside the forgejo namespace, adjust the FQDN pattern accordingly:

tcp://${RUNNER_NAME}.runners.<your-namespace>.svc.cluster.local:2376
## Problem Statement When running Forgejo runners in Kubernetes with Docker-in-Docker (DinD), workflow step containers need to access the Docker daemon from their parent pod. However, standard Kubernetes deployments don't provide stable, predictable DNS names for individual pods, making it impossible for step containers to reliably connect to the Docker daemon. ## The Challenge The networking challenge stems from how DNS resolution works in this setup: - Kubernetes containers inherit the cluster's DNS service configuration - Docker-in-Docker daemons pass their DNS configuration to child containers - Workflow steps (running as containers) need to resolve the Docker daemon's address - Standard deployments don't create per-pod DNS records ## Solution: StatefulSets with Headless Services The solution involves converting the runner deployment to a StatefulSet and creating a headless service. This combination provides each pod with a stable, unique DNS name following the pattern: ``` <pod-name>.<service-name>.<namespace>.svc.cluster.local ``` ## Implementation ### Step 1: Create a Headless Service ```yaml apiVersion: v1 kind: Service metadata: name: runners labels: app.kubernetes.io/name: forgejo-runner spec: type: ClusterIP clusterIP: None selector: app.kubernetes.io/name: forgejo-runner ports: - name: docker port: 2376 targetPort: 2376 protocol: TCP ``` ### Step 2: Convert to StatefulSet Change your Deployment to a StatefulSet and add the `serviceName`: ```yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: forgejo-runner spec: replicas: 2 serviceName: runners # Links to the headless service selector: matchLabels: app.kubernetes.io/name: forgejo-runner ``` ### Step 3: Update Runner Registration Modify the init container to inject the DOCKER_HOST with the pod's FQDN: ```yaml initContainers: - name: runner-register command: - /bin/bash - -ec - | # ... registration logic ... forgejo-runner generate-config > config.yml sed -i -e "s|^ envs:$| envs:\n DOCKER_HOST: tcp://${RUNNER_NAME}.runners.forgejo.svc.cluster.local:2376\n DOCKER_TLS_VERIFY: 1\n DOCKER_CERT_PATH: /certs/client|" config.yml sed -i -e "s|^ options:| options: -v /certs/client:/certs/client|" config.yml sed -i -e "s| valid_volumes: \[\]$$| valid_volumes:\n - /certs/client|" config.yml ``` ### Step 4: Configure TLS SANs Add the pod's FQDN to the Docker daemon's TLS certificate: ```yaml - name: daemon image: docker:28.3.3-dind env: - name: RUNNER_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: DOCKER_TLS_SAN value: DNS:$(RUNNER_NAME).runners.forgejo.svc.cluster.local,DNS:localhost,IP:127.0.0.1 ``` ## Key Points 1. **StatefulSet with serviceName**: Creates predictable DNS names like `forgejo-runner-0.runners.forgejo.svc.cluster.local` 2. **DOCKER_HOST injection**: The sed command injects the pod's FQDN into the runner config, ensuring workflow containers know where to find the Docker daemon 3. **TLS certificate SANs**: The DOCKER_TLS_SAN environment variable ensures the certificate is valid for both localhost and the Kubernetes DNS name ## Namespace Considerations If deploying outside the `forgejo` namespace, adjust the FQDN pattern accordingly: ```bash tcp://${RUNNER_NAME}.runners.<your-namespace>.svc.cluster.local:2376 ```
Owner

you can also pass the pod ip1 to the container and use that for dynamic configuration without using dns. it'll work with depoyment too.

Using a stateful set is anyways a better solution to have static runner names, so you don't need to reregister on every restart.

you can also pass the pod ip[^1] to the container and use that for dynamic configuration without using dns. it'll work with depoyment too. Using a stateful set is anyways a better solution to have static runner names, so you don't need to reregister on every restart. [^1]: https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
Owner

this is my stateful set:

---
apiVersion: v1
kind: Service
metadata:
  name: forgejo-runner
  labels:
    app: forgejo-runner
spec:
  clusterIP: None
  selector:
    app: forgejo-runner

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: forgejo-runner
  labels:
    app: forgejo-runner
spec:
  replicas: 1
  serviceName: forgejo-runner
  selector:
    matchLabels:
      app: forgejo-runner
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: forgejo-runner
    spec:
      restartPolicy: Always
      # Initialise our configuration file using offline registration
      # https://forgejo.org/docs/v1.21/admin/actions/#offline-registration
      initContainers:
        - name: pvc-prepare
          imagePullPolicy: IfNotPresent
          image: ghcr.io/visualon/forgejo-runner
          command:
            - sh
            - -c
            - mkdir -p /data/runner /data/docuum

          volumeMounts:
            - name: data
              mountPath: /data

        - name: runner-register
          imagePullPolicy: IfNotPresent
          image: ghcr.io/visualon/forgejo-runner
          command:
            - sh
            - -c
            - |
              if [ -f /data/.runner ]
              then
                echo 'Runner already registered'
              else
                echo 'Registering new runner ...'
                forgejo-runner register \
                  --no-interactive \
                  --token $(RUNNER_SECRET) \
                  --name $(RUNNER_NAME) \
                  --instance $(FORGEJO_INSTANCE_URL) \
                  --config /data/config.yaml \
                  ;
              fi
          env:
            - name: RUNNER_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          envFrom:
            - configMapRef:
                name: runner
            - secretRef:
                name: runner
          resources:
            limits:
              cpu: '0.50'
              memory: '64Mi'
          volumeMounts:
            - name: data
              mountPath: /data
              subPath: runner
            - name: forgejo-runner-files
              mountPath: /data/config.yaml
              readOnly: true
              subPath: runner-config.yaml
      containers:
        - name: forgejo-runner
          imagePullPolicy: IfNotPresent
          image: ghcr.io/visualon/forgejo-runner
          command:
            - sh
            - -c
            - while ! nc -z localhost 2376 </dev/null; do echo 'waiting for docker daemon...'; sleep 5; done; forgejo-runner daemon -c /data/config.yaml
          resources:
            requests:
              memory: 64Mi
              cpu: 100m
            limits:
              memory: 1Gi
              cpu: 1000m
          env:
            - name: DOCKER_HOST
              value: tcp://localhost:2376
            - name: DOCKER_CERT_PATH
              value: '/certs/client'
            - name: DOCKER_TLS_VERIFY
              value: 'true'
          volumeMounts:
            - name: data
              mountPath: /data
              subPath: runner
            - name: forgejo-runner-docker-certs
              mountPath: /certs
            - name: forgejo-runner-files
              mountPath: /data/config.yaml
              readOnly: true
              subPath: runner-config.yaml

        - name: docker-dind
          imagePullPolicy: IfNotPresent
          image: ghcr.io/visualon/docker-dind
          args:
            - --mtu=1450
            - --default-network-opt=bridge=com.docker.network.driver.mtu=1450
          resources:
            requests:
              memory: 256Mi
              cpu: 100m
            limits:
              memory: 4Gi
              cpu: 2000m
          env:
            - name: DOCKER_TLS_CERTDIR
              value: '/certs'
          securityContext:
            privileged: true
          volumeMounts:
            - name: forgejo-runner-docker-certs
              mountPath: /certs

        - name: docuum
          imagePullPolicy: IfNotPresent
          image: ghcr.io/visualon/docuum
          args:
            - --threshold
            - 10 GB
          resources:
            requests:
              memory: 64Mi
              cpu: 10m
            limits:
              memory: 256Mi
              cpu: 100m
          env:
            - name: DOCKER_HOST
              value: tcp://localhost:2376
            - name: DOCKER_CERT_PATH
              value: '/certs/client'
            - name: DOCKER_TLS_VERIFY
              value: 'true'
          volumeMounts:
            - name: data
              mountPath: /root/.local/docuum
              subPath: docuum
            - name: forgejo-runner-docker-certs
              mountPath: /certs

      volumes:
        - name: forgejo-runner-docker-certs
          emptyDir: {}
        - name: forgejo-runner-files
          configMap:
            name: runner-files

  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi

I'm using mirrored / custom images because i need to internal certificates 😉

this is my stateful set: ```yaml --- apiVersion: v1 kind: Service metadata: name: forgejo-runner labels: app: forgejo-runner spec: clusterIP: None selector: app: forgejo-runner --- apiVersion: apps/v1 kind: StatefulSet metadata: name: forgejo-runner labels: app: forgejo-runner spec: replicas: 1 serviceName: forgejo-runner selector: matchLabels: app: forgejo-runner updateStrategy: type: RollingUpdate template: metadata: labels: app: forgejo-runner spec: restartPolicy: Always # Initialise our configuration file using offline registration # https://forgejo.org/docs/v1.21/admin/actions/#offline-registration initContainers: - name: pvc-prepare imagePullPolicy: IfNotPresent image: ghcr.io/visualon/forgejo-runner command: - sh - -c - mkdir -p /data/runner /data/docuum volumeMounts: - name: data mountPath: /data - name: runner-register imagePullPolicy: IfNotPresent image: ghcr.io/visualon/forgejo-runner command: - sh - -c - | if [ -f /data/.runner ] then echo 'Runner already registered' else echo 'Registering new runner ...' forgejo-runner register \ --no-interactive \ --token $(RUNNER_SECRET) \ --name $(RUNNER_NAME) \ --instance $(FORGEJO_INSTANCE_URL) \ --config /data/config.yaml \ ; fi env: - name: RUNNER_NAME valueFrom: fieldRef: fieldPath: metadata.name envFrom: - configMapRef: name: runner - secretRef: name: runner resources: limits: cpu: '0.50' memory: '64Mi' volumeMounts: - name: data mountPath: /data subPath: runner - name: forgejo-runner-files mountPath: /data/config.yaml readOnly: true subPath: runner-config.yaml containers: - name: forgejo-runner imagePullPolicy: IfNotPresent image: ghcr.io/visualon/forgejo-runner command: - sh - -c - while ! nc -z localhost 2376 </dev/null; do echo 'waiting for docker daemon...'; sleep 5; done; forgejo-runner daemon -c /data/config.yaml resources: requests: memory: 64Mi cpu: 100m limits: memory: 1Gi cpu: 1000m env: - name: DOCKER_HOST value: tcp://localhost:2376 - name: DOCKER_CERT_PATH value: '/certs/client' - name: DOCKER_TLS_VERIFY value: 'true' volumeMounts: - name: data mountPath: /data subPath: runner - name: forgejo-runner-docker-certs mountPath: /certs - name: forgejo-runner-files mountPath: /data/config.yaml readOnly: true subPath: runner-config.yaml - name: docker-dind imagePullPolicy: IfNotPresent image: ghcr.io/visualon/docker-dind args: - --mtu=1450 - --default-network-opt=bridge=com.docker.network.driver.mtu=1450 resources: requests: memory: 256Mi cpu: 100m limits: memory: 4Gi cpu: 2000m env: - name: DOCKER_TLS_CERTDIR value: '/certs' securityContext: privileged: true volumeMounts: - name: forgejo-runner-docker-certs mountPath: /certs - name: docuum imagePullPolicy: IfNotPresent image: ghcr.io/visualon/docuum args: - --threshold - 10 GB resources: requests: memory: 64Mi cpu: 10m limits: memory: 256Mi cpu: 100m env: - name: DOCKER_HOST value: tcp://localhost:2376 - name: DOCKER_CERT_PATH value: '/certs/client' - name: DOCKER_TLS_VERIFY value: 'true' volumeMounts: - name: data mountPath: /root/.local/docuum subPath: docuum - name: forgejo-runner-docker-certs mountPath: /certs volumes: - name: forgejo-runner-docker-certs emptyDir: {} - name: forgejo-runner-files configMap: name: runner-files volumeClaimTemplates: - metadata: name: data spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi ``` I'm using mirrored / custom images because i need to internal certificates 😉

I ran into the same issue creating a new network for a service, removing network: "host" fixed that.

Now I'm also unable to connect to the docker-in-docker container for docker workflows. I'm using Docker compose and not familiar with Kubernetes. I tried translating the configs @metawave posted, but wasn't able to get it working again. Is anyone able to help out? 😄

Here's my docker-compose.yml

services:
  server:
    image: codeberg.org/forgejo/forgejo:12.0.3-rootless
    container_name: forgejo
    user: ${PUID}:${PGID}
    environment:
      USER_UID: ${PUID}
      USER_GID: ${PGID}
      # Forgejo configuration
      FORGEJO__SERVER__LFS_JWT_SECRET: ${LFS_JWT_SECRET}
      FORGEJO__SECURITY__INTERNAL_TOKEN: ${INTERNAL_TOKEN}
      FORGEJO__MAILER__SMTP_ADDR: ${SMTP_HOST}
      FORGEJO__MAILER__USER: ${SMTP_USER}
      FORGEJO__MAILER__PASSWD: ${SMTP_PASSWORD}

    volumes:
      - ${DOCKER_DATA_DIR}/forgejo/data:/var/lib/gitea
      - ${DOCKER_CONFIG_DIR}/forgejo/app.ini:/var/lib/gitea/custom/conf/app.ini
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    networks:
      - caddy
      - forgejo
    restart: unless-stopped

  runner:
    image: data.forgejo.org/forgejo/runner:9.1.1
    container_name: forgejo-runner
    user: ${PUID}:${PGID}
    links:
      - docker-in-docker
    environment:
      DOCKER_HOST: tcp://docker:2376
      DOCKER_CERT_PATH: /certs/client
      DOCKER_TLS_VERIFY: "1"
    volumes:
      - ${DOCKER_DATA_DIR}/forgejo/runner/data:/data
      - ${DOCKER_CONFIG_DIR}/forgejo/runner/config/config.yaml:/data/config.yaml
      - docker-certs-client:/certs/client:ro
    networks:
      - forgejo
    command: '/bin/sh -c "sleep 5; forgejo-runner -c /data/config.yaml daemon"'
    restart: unless-stopped
    depends_on:
      docker-in-docker:
        condition: service_started


  docker-in-docker:
    image: docker:28.4.0-dind
    container_name: forgejo-dind
    privileged: true
    hostname: docker
    environment:
      DOCKER_TLS_CERTDIR: /certs
    volumes:
      - docker-certs-server:/certs/server
      - docker-certs-client:/certs/client
    networks:
      - forgejo
    restart: unless-stopped

volumes:
  docker-certs-server:
    name: forgejo-docker-certs-server
  docker-certs-client:
    name: forgejo-docker-certs-client

networks:
  forgejo:
    name: forgejo
  caddy:
    external: true

Here's my runner config.yml

# You don't have to copy this file to your instance,
# just run `forgejo-runner generate-config > config.yaml` to generate a config file.

log:
  # The level of logging, can be trace, debug, info, warn, error, fatal
  level: info
  # The level of logging for jobs, can be trace, debug, info, earn, error, fatal
  job_level: info

runner:
  # Where to store the registration result.
  file: .runner
  # Execute how many tasks concurrently at the same time.
  capacity: 1
  envs:
    DOCKER_HOST: tcp://docker:2376
    DOCKER_TLS_VERIFY: "1"
    DOCKER_CERT_PATH: /certs/client
  # The timeout for a job to be finished.
  # Please note that the Forgejo instance also has a timeout (3h by default) for the job.
  # So the job could be stopped by the Forgejo instance if it's timeout is shorter than this.
  timeout: 3h
  # The timeout for the runner to wait for running jobs to finish when
  # shutting down because a TERM or INT signal has been received.  Any
  # running jobs that haven't finished after this timeout will be
  # cancelled.
  # If unset or zero the jobs will be cancelled immediately.
  shutdown_timeout: 3h
  # Whether skip verifying the TLS certificate of the instance.
  insecure: true
  # The timeout for fetching the job from the Forgejo instance.
  fetch_timeout: 5s
  # The interval for fetching the job from the Forgejo instance.
  fetch_interval: 2s
  # The interval for reporting the job status and logs to the Forgejo instance.
  report_interval: 1s
  # The labels of a runner are used to determine which jobs the runner can run, and how to run them.
  # Like: ["macos-arm64:host", "ubuntu-latest:docker://node:20-bookworm", "ubuntu-22.04:docker://node:20-bookworm"]
  # If it's empty when registering, it will ask for inputting labels.
  # If it's empty when executing the `daemon`, it will use labels in the `.runner` file.
  labels: ["docker:docker://data.forgejo.org/oci/node:22-bookworm"]

cache:
  # Enable cache server to use actions/cache.
  enabled: true
  # The directory to store the cache data.
  # If it's empty, the cache data will be stored in $HOME/.cache/actcache.
  dir: ""
  # The host of the cache server.
  # It's not for the address to listen, but the address to connect from job containers.
  # So 0.0.0.0 is a bad choice, leave it empty to detect automatically.
  host: ""
  # The port of the cache server.
  # 0 means to use a random available port.
  port: 0
  # The port of the cache proxy.
  # 0 means to use a random available port.
  proxy_port: 0
  # The external cache server URL. Valid only when enable is true.
  # If it's specified, it will be used to set the ACTIONS_CACHE_URL environment variable. The URL should generally end with "/".
  # Otherwise it will be set to the the URL of the internal cache server.
  external_server: ""
  # The shared cache secret. When communicating with a cache server, the runner uses this secret to verify the authenticity of the cache requests.
  # When using an external cache server it is required to set the same secret for the runner and the cache server.
  secret: ""
  # Overrides the ACTIONS_CACHE_URL passed to workflow containers. This should only be used if the runner host is not reachable from the
  # workflow containers, and requires further setup.
  actions_cache_url_override: ""

container:
  # Specifies the network to which the container will connect.
  # Could be host, bridge or the name of a custom network.
  # If it's empty, create a network automatically.
  network: ""
  # Whether to create networks with IPv6 enabled. Requires the Docker daemon to be set up accordingly.
  # Only takes effect if "network" is set to "".
  enable_ipv6: false
  # Whether to use privileged mode or not when launching task containers (privileged mode is required for Docker-in-Docker).
  privileged: true
  # And other options to be used when the container is started (eg, --volume /etc/ssl/certs:/etc/ssl/certs:ro).
  options: -v /certs/client:/certs/client:ro
  # The parent directory of a job's working directory.
  # If it's empty, /workspace will be used.
  workdir_parent:
  # Volumes (including bind mounts) can be mounted to containers. Glob syntax is supported, see https://github.com/gobwas/glob
  # You can specify multiple volumes. If the sequence is empty, no volumes can be mounted.
  # For example, if you only allow containers to mount the `data` volume and all the json files in `/src`, you should change the config to:
  # valid_volumes:
  #   - data
  #   - /etc/ssl/certs
  # If you want to allow any volume, please use the following configuration:
  # valid_volumes:
  #   - '**'
  valid_volumes:
    - /certs/client
  # overrides the docker client host with the specified one.
  # If "-" or "", an available docker host will automatically be found.
  # If "automount", an available docker host will automatically be found and mounted in the job container (e.g. /var/run/docker.sock).
  # Otherwise the specified docker host will be used and an error will be returned if it doesn't work.
  docker_host: "-"
  # Pull docker image(s) even if already present
  force_pull: false
  # Rebuild local docker image(s) even if already present
  force_rebuild: false

host:
  # The parent directory of a job's working directory.
  # If it's empty, $HOME/.cache/act/ will be used.
  workdir_parent:

and then my workflow

on:
  workflow_dispatch:

jobs:
  testing:
    runs-on: docker
    container:
      image: docker:cli
    services:
      pgsql:
        image: postgres:15
        env:
          POSTGRES_DB: test
          POSTGRES_PASSWORD: postgres

    steps:
      - name: Test docker
        run: |
          echo | docker ps

I've tried using both "tcp://docker:2376" and "tcp://forgejo-runner:2376" for DOCKER_HOST, but each time I get a similar lookup error

error during connect: Get "https://docker:2376/v1.51/containers/json": dial tcp: lookup docker on 127.0.0.11:53: no such host

Which makes sense because the job container is no longer on the host (forgejo) network. But I'm not sure how to fix it.

I ran into the same issue creating a new network for a service, removing `network: "host"` fixed that. Now I'm also unable to connect to the docker-in-docker container for docker workflows. I'm using Docker compose and not familiar with Kubernetes. I tried translating the configs @metawave posted, but wasn't able to get it working again. Is anyone able to help out? 😄 Here's my docker-compose.yml ```yaml services: server: image: codeberg.org/forgejo/forgejo:12.0.3-rootless container_name: forgejo user: ${PUID}:${PGID} environment: USER_UID: ${PUID} USER_GID: ${PGID} # Forgejo configuration FORGEJO__SERVER__LFS_JWT_SECRET: ${LFS_JWT_SECRET} FORGEJO__SECURITY__INTERNAL_TOKEN: ${INTERNAL_TOKEN} FORGEJO__MAILER__SMTP_ADDR: ${SMTP_HOST} FORGEJO__MAILER__USER: ${SMTP_USER} FORGEJO__MAILER__PASSWD: ${SMTP_PASSWORD} volumes: - ${DOCKER_DATA_DIR}/forgejo/data:/var/lib/gitea - ${DOCKER_CONFIG_DIR}/forgejo/app.ini:/var/lib/gitea/custom/conf/app.ini - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro networks: - caddy - forgejo restart: unless-stopped runner: image: data.forgejo.org/forgejo/runner:9.1.1 container_name: forgejo-runner user: ${PUID}:${PGID} links: - docker-in-docker environment: DOCKER_HOST: tcp://docker:2376 DOCKER_CERT_PATH: /certs/client DOCKER_TLS_VERIFY: "1" volumes: - ${DOCKER_DATA_DIR}/forgejo/runner/data:/data - ${DOCKER_CONFIG_DIR}/forgejo/runner/config/config.yaml:/data/config.yaml - docker-certs-client:/certs/client:ro networks: - forgejo command: '/bin/sh -c "sleep 5; forgejo-runner -c /data/config.yaml daemon"' restart: unless-stopped depends_on: docker-in-docker: condition: service_started docker-in-docker: image: docker:28.4.0-dind container_name: forgejo-dind privileged: true hostname: docker environment: DOCKER_TLS_CERTDIR: /certs volumes: - docker-certs-server:/certs/server - docker-certs-client:/certs/client networks: - forgejo restart: unless-stopped volumes: docker-certs-server: name: forgejo-docker-certs-server docker-certs-client: name: forgejo-docker-certs-client networks: forgejo: name: forgejo caddy: external: true ``` Here's my runner config.yml ```yaml # You don't have to copy this file to your instance, # just run `forgejo-runner generate-config > config.yaml` to generate a config file. log: # The level of logging, can be trace, debug, info, warn, error, fatal level: info # The level of logging for jobs, can be trace, debug, info, earn, error, fatal job_level: info runner: # Where to store the registration result. file: .runner # Execute how many tasks concurrently at the same time. capacity: 1 envs: DOCKER_HOST: tcp://docker:2376 DOCKER_TLS_VERIFY: "1" DOCKER_CERT_PATH: /certs/client # The timeout for a job to be finished. # Please note that the Forgejo instance also has a timeout (3h by default) for the job. # So the job could be stopped by the Forgejo instance if it's timeout is shorter than this. timeout: 3h # The timeout for the runner to wait for running jobs to finish when # shutting down because a TERM or INT signal has been received. Any # running jobs that haven't finished after this timeout will be # cancelled. # If unset or zero the jobs will be cancelled immediately. shutdown_timeout: 3h # Whether skip verifying the TLS certificate of the instance. insecure: true # The timeout for fetching the job from the Forgejo instance. fetch_timeout: 5s # The interval for fetching the job from the Forgejo instance. fetch_interval: 2s # The interval for reporting the job status and logs to the Forgejo instance. report_interval: 1s # The labels of a runner are used to determine which jobs the runner can run, and how to run them. # Like: ["macos-arm64:host", "ubuntu-latest:docker://node:20-bookworm", "ubuntu-22.04:docker://node:20-bookworm"] # If it's empty when registering, it will ask for inputting labels. # If it's empty when executing the `daemon`, it will use labels in the `.runner` file. labels: ["docker:docker://data.forgejo.org/oci/node:22-bookworm"] cache: # Enable cache server to use actions/cache. enabled: true # The directory to store the cache data. # If it's empty, the cache data will be stored in $HOME/.cache/actcache. dir: "" # The host of the cache server. # It's not for the address to listen, but the address to connect from job containers. # So 0.0.0.0 is a bad choice, leave it empty to detect automatically. host: "" # The port of the cache server. # 0 means to use a random available port. port: 0 # The port of the cache proxy. # 0 means to use a random available port. proxy_port: 0 # The external cache server URL. Valid only when enable is true. # If it's specified, it will be used to set the ACTIONS_CACHE_URL environment variable. The URL should generally end with "/". # Otherwise it will be set to the the URL of the internal cache server. external_server: "" # The shared cache secret. When communicating with a cache server, the runner uses this secret to verify the authenticity of the cache requests. # When using an external cache server it is required to set the same secret for the runner and the cache server. secret: "" # Overrides the ACTIONS_CACHE_URL passed to workflow containers. This should only be used if the runner host is not reachable from the # workflow containers, and requires further setup. actions_cache_url_override: "" container: # Specifies the network to which the container will connect. # Could be host, bridge or the name of a custom network. # If it's empty, create a network automatically. network: "" # Whether to create networks with IPv6 enabled. Requires the Docker daemon to be set up accordingly. # Only takes effect if "network" is set to "". enable_ipv6: false # Whether to use privileged mode or not when launching task containers (privileged mode is required for Docker-in-Docker). privileged: true # And other options to be used when the container is started (eg, --volume /etc/ssl/certs:/etc/ssl/certs:ro). options: -v /certs/client:/certs/client:ro # The parent directory of a job's working directory. # If it's empty, /workspace will be used. workdir_parent: # Volumes (including bind mounts) can be mounted to containers. Glob syntax is supported, see https://github.com/gobwas/glob # You can specify multiple volumes. If the sequence is empty, no volumes can be mounted. # For example, if you only allow containers to mount the `data` volume and all the json files in `/src`, you should change the config to: # valid_volumes: # - data # - /etc/ssl/certs # If you want to allow any volume, please use the following configuration: # valid_volumes: # - '**' valid_volumes: - /certs/client # overrides the docker client host with the specified one. # If "-" or "", an available docker host will automatically be found. # If "automount", an available docker host will automatically be found and mounted in the job container (e.g. /var/run/docker.sock). # Otherwise the specified docker host will be used and an error will be returned if it doesn't work. docker_host: "-" # Pull docker image(s) even if already present force_pull: false # Rebuild local docker image(s) even if already present force_rebuild: false host: # The parent directory of a job's working directory. # If it's empty, $HOME/.cache/act/ will be used. workdir_parent: ``` and then my workflow ```yaml on: workflow_dispatch: jobs: testing: runs-on: docker container: image: docker:cli services: pgsql: image: postgres:15 env: POSTGRES_DB: test POSTGRES_PASSWORD: postgres steps: - name: Test docker run: | echo | docker ps ``` I've tried using both "tcp://docker:2376" and "tcp://forgejo-runner:2376" for DOCKER_HOST, but each time I get a similar lookup error `error during connect: Get "https://docker:2376/v1.51/containers/json": dial tcp: lookup docker on 127.0.0.11:53: no such host` Which makes sense because the job container is no longer on the host (forgejo) network. But I'm not sure how to fix it.
Owner

@zonrek I think your problem is going to be this: from containers that run within the docker-in-docker container, you won't be able to resolve resolve the DNS address docker. Only the host will be able to resolve this DNS resolution to containers running directly on the host.

You should be able to add an --add-host option to containers that are spawned to force this to resolve in the context of the host...

container:
  # DNS resolve `docker` in the context of the container's
  # host (the DIND container) to allow tcp://... to find it.
  options: '--add-host=docker:host-gateway'

There's a newly released document in the Forgejo documentation about configurations for utilizing Docker from Actions which contains a detailed and tested configuration, and I'm pulling that missing config piece from that document -- https://forgejo.org/docs/latest/admin/actions/docker-access/ Obviously your config isn't exactly the same, but even if this piece doesn't help perhaps the guide will help you out with any other details.

@zonrek I think your problem is going to be this: from containers that run within the docker-in-docker container, you won't be able to resolve resolve the DNS address `docker`. Only the host will be able to resolve this DNS resolution to containers running directly on the host. You should be able to add an `--add-host` option to containers that are spawned to force this to resolve in the context of the host... ```yaml container: # DNS resolve `docker` in the context of the container's # host (the DIND container) to allow tcp://... to find it. options: '--add-host=docker:host-gateway' ``` There's a newly released document in the Forgejo documentation about configurations for utilizing Docker from Actions which contains a detailed and tested configuration, and I'm pulling that missing config piece from that document -- https://forgejo.org/docs/latest/admin/actions/docker-access/ Obviously your config isn't exactly the same, but even if this piece doesn't help perhaps the guide will help you out with any other details.

@mfenniak Yep that worked perfectly. Thank you!

@mfenniak Yep that worked perfectly. Thank you!
Contributor

@zonrek can this issue be closed?

@zonrek can this issue be closed?
Member

@earl-warren wrote in #912 (comment):

can this issue be closed?

Shouldn't that question be directed at the person that opened the issue?

@earl-warren wrote in https://code.forgejo.org/forgejo/runner/issues/912#issuecomment-60015: > can this issue be closed? Shouldn't that question be directed at the person that opened the issue?
Contributor

I stand corrected 😊

@NickP what do you think?

I stand corrected 😊 @NickP what do you think?
Contributor

@NickP I'm closing this issue. Please re-open if you feel something still needs to be addressed.

@NickP I'm closing this issue. Please re-open if you feel something still needs to be addressed.
Sign in to join this conversation.
No milestone
No assignees
8 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/runner#912
No description provided.