Runner not respecting provided labels #174

Closed
opened 2024-04-06 16:40:25 +00:00 by zwanto · 8 comments
Contributor

As discuss in #153 (comment), runner is not respecting provided labels.

you can see it here: https://code.forgejo.org/forgejo/runner/actions/runs/870#jobstep-4-277
Where the actions use node:16-bullseye instead of alpine as specified inside the compose file here

sed -i -e "s|labels: \[\]|labels: \[\"docker:docker://alpine:3.18\"\]|" config.yml ;

This issue is tied to this existing issue : #149 but the problem is larger than just kubernetes as I faced it using the docker-compose examples.

I try to investigate and it seems that the sed command in eb89a98c6a/examples/docker-compose/compose-forgejo-and-runner.yml are all good.

It seems that for offline registration, specifying labels in the config.yml don't work.

The only way I achieved to make it work is by putting label in the .runner file after it's generation and before starting the runner:

while : ; do
  forgejo-runner create-runner-file --connect --instance http://forgejo:3000 --name runner --secret cee528fb6dc181b65dcd91d9dc2073a0591b30fb && break ;
  sleep 1 ;
done ;
sed -i -e "s|\"labels\": null|\"labels\": \[ \"docker:docker://alpine:3.18\"\]|" .runner ;

Starting the runner (using forgejo-runner --config config.yml daemon) with labels in the config.yml and without labels in the .runner seems to populate the .runner but the labels are not taken into account (maybe a restart of the runner would work ?).

Btw using online registration, it work like a charm using this (without having to specify the labels elsewhere):

forgejo-runner register --no-interactive --token XXX --name forest --instance XXX --labels docker:docker://alpine:3.18

⚠️ Actually, using alpine:3.18 in the example will make the CI fail because this image do not have node inside and it is needed for the actions. It would be better using node:16-bullseye. It were not failling until now because the labels were not taken into accout by the runner ...

As discuss in https://code.forgejo.org/forgejo/runner/issues/153#issuecomment-6928, runner is not respecting provided labels. you can see it here: https://code.forgejo.org/forgejo/runner/actions/runs/870#jobstep-4-277 Where the actions use `node:16-bullseye` instead of `alpine` as specified inside the compose file here https://code.forgejo.org/forgejo/runner/src/commit/eb89a98c6a401bc0afc6f9a897a82de0dfcb9d8d/examples/docker-compose/compose-forgejo-and-runner.yml#L65 This issue is tied to this existing issue : #149 but the problem is larger than just kubernetes as I faced it using the docker-compose examples. I try to investigate and it seems that the `sed` command in https://code.forgejo.org/forgejo/runner/src/commit/eb89a98c6a401bc0afc6f9a897a82de0dfcb9d8d/examples/docker-compose/compose-forgejo-and-runner.yml are all good. It seems that for offline registration, specifying labels in the `config.yml` don't work. The only way I achieved to make it work is by putting label in the `.runner` file after it's generation and before starting the runner: ```bash while : ; do forgejo-runner create-runner-file --connect --instance http://forgejo:3000 --name runner --secret cee528fb6dc181b65dcd91d9dc2073a0591b30fb && break ; sleep 1 ; done ; sed -i -e "s|\"labels\": null|\"labels\": \[ \"docker:docker://alpine:3.18\"\]|" .runner ; ``` Starting the runner (using `forgejo-runner --config config.yml daemon`) with labels in the `config.yml` and without labels in the `.runner` seems to populate the `.runner` but the labels are not taken into account (maybe a restart of the runner would work ?). Btw using online registration, it work like a charm using this (without having to specify the labels elsewhere): ``` forgejo-runner register --no-interactive --token XXX --name forest --instance XXX --labels docker:docker://alpine:3.18 ``` ⚠️ Actually, using `alpine:3.18` in the example will make the CI fail because this image do not have node inside and it is needed for the actions. It would be better using `node:16-bullseye`. It were not failling until now because the labels were not taken into accout by the runner ...
Author
Contributor

Here some working example in my work for the #153:

sed -i -e "s|\"labels\": null|\"labels\": [\"docker:docker://node:16-bullseye\", \"ubuntu-22.04:docker://catthehacker/ubuntu:act-22.04\"]|" .runner ;

Here some working example in my work for the #153: https://code.forgejo.org/zwanto/runner/src/commit/f8287eba2cf2b6635af585a4e7827e8b124f2a2c/examples/docker-compose/compose-forgejo-and-runner.yml#L69
Member

Thanks for the samples! I was able to reproduce it.

maybe a restart of the runner would work ?

It does in fact fix it.

rundown

Runner loads registration labels and then checks the config ones, claiming to use all of them. Issue is that the declared labels aren't the same as the ones passes to NewRunner.

NewRunner gets config and registration but only uses labels from registration.

Possible quick fix

After runner loads all labels replace original ones with the declared ones: abc9bf653a

This doubles labels parsing, but at least runner still stop "lying" to the instance about it's labels.

Thanks for the samples! I was able to reproduce it. > maybe a restart of the runner would work ? It does in fact fix it. ## rundown [Runner loads registration labels](https://code.forgejo.org/forgejo/runner/src/branch/main/internal/app/cmd/daemon.go#L55) and then checks the config ones, [claiming to use all of them](https://code.forgejo.org/forgejo/runner/src/branch/main/internal/app/cmd/daemon.go#L103). Issue is that the declared labels aren't the same as the ones passes to `NewRunner`. [NewRunner](https://code.forgejo.org/forgejo/runner/src/branch/main/internal/app/cmd/daemon.go#L103) gets config and registration but only uses labels from registration. ## Possible quick fix After runner loads all labels replace original ones with the declared ones: https://code.forgejo.org/thefox/runner/commit/abc9bf653a5cc2fc8f8fa76b3da79ea6236b51c2 This doubles labels parsing, but at least runner still stop "lying" to the instance about it's labels.
earl-warren added the
Kind/Bug
label 2024-04-07 06:37:20 +00:00
Owner

I ran the example and confirm the problem. Although the config.yml has the proper label set, the runner is not registered with a label when looking at http://0.0.0.0:8080/root/test/settings/actions/runners

image

$ cat /srv/runner-data/config.yml

log:
level: info

runner:
file: .runner
capacity: 1
envs:
A_TEST_ENV_NAME_1: a_test_env_value_1
A_TEST_ENV_NAME_2: a_test_env_value_2
env_file: .env
timeout: 3h
insecure: false
fetch_timeout: 5s
fetch_interval: 2s
labels: ["docker:docker://alpine:3.18"]

cache:
enabled: true
dir: ""
host: ""
port: 0
external_server: ""

container:
network: host
enable_ipv6: false
privileged: false
options:
workdir_parent:
valid_volumes: []
docker_host: ""
force_pull: false

host:
workdir_parent:

I ran the example and confirm the problem. Although the config.yml has the proper label set, the runner is not registered with a label when looking at http://0.0.0.0:8080/root/test/settings/actions/runners ![image](/attachments/bf3690ab-d4e4-436b-aca7-3baf02e9834b) <details> $ cat /srv/runner-data/config.yml log: level: info runner: file: .runner capacity: 1 envs: A_TEST_ENV_NAME_1: a_test_env_value_1 A_TEST_ENV_NAME_2: a_test_env_value_2 env_file: .env timeout: 3h insecure: false fetch_timeout: 5s fetch_interval: 2s labels: ["docker:docker://alpine:3.18"] cache: enabled: true dir: "" host: "" port: 0 external_server: "" container: network: host enable_ipv6: false privileged: false options: workdir_parent: valid_volumes: [] docker_host: "" force_pull: false host: workdir_parent: </details>
Owner

The problem seems to be rooted in the first invocation of the runner daemon. When it is restarted, the "docker" label is correctly associated to the image specified in the configuration file.

  • The config.yml & .runner files are exactly identical in both cases
  • With debug information both show the same first lines.
    info msg="log level changed to debug" func="[initLogging]" file="[daemon.go:160]"
    info msg="Starting runner daemon" func="[func6]" file="[daemon.go:38]"
    debug msg="gc: 2024-04-07 09:51:37.029511743 +0000 UTC m=+0.019732106" func="[gcCache]" file="[handler.go:445]" module=artifactcache
    info msg="runner: runner, with version: v3.4.1, with labels: [docker], declare successfully" func="[func6]" file="[daemon.go:111]"
    info msg="task 1 repo is root/test https://code.forgejo.org http://forgejo:3000" func="[run]" file="[runner.go:146]"
    debug msg="Plan Stages: [0xc00012ced0]" func="[NewPlanExecutor]" file="[runner.go:137]"
    debug msg="Stages Runs: [test]" func="[func1]" file="[runner.go:144]"
    

Since Forgejo knows nothing about how the labels are mapped to images by default, the bug has to be in the runner.

The problem seems to be rooted in the first invocation of the runner daemon. When it is restarted, the "docker" label is correctly associated to the image specified in the configuration file. * The config.yml & .runner files are exactly identical in both cases * With debug information both show the same first lines. ``` info msg="log level changed to debug" func="[initLogging]" file="[daemon.go:160]" info msg="Starting runner daemon" func="[func6]" file="[daemon.go:38]" debug msg="gc: 2024-04-07 09:51:37.029511743 +0000 UTC m=+0.019732106" func="[gcCache]" file="[handler.go:445]" module=artifactcache info msg="runner: runner, with version: v3.4.1, with labels: [docker], declare successfully" func="[func6]" file="[daemon.go:111]" info msg="task 1 repo is root/test https://code.forgejo.org http://forgejo:3000" func="[run]" file="[runner.go:146]" debug msg="Plan Stages: [0xc00012ced0]" func="[NewPlanExecutor]" file="[runner.go:137]" debug msg="Stages Runs: [test]" func="[func1]" file="[runner.go:144]" ``` Since Forgejo knows nothing about how the labels are mapped to images by default, the bug has to be in the runner.
Owner

The bug is that when the labels in .runner & config.yml are different, the .runner labels take precedence and then the labels found in config.yml are written to .runner. In the docker compose example the .runner file has no labels the first time around, hence defaults to bullseye. So the second time around the labels from config.yml will be used. I'll file a bug with a reproducer and add a workaround in the example in the meantime to avoid confusing until a release fixing this bug is published.

The bug is that when the labels in .runner & config.yml are different, the .runner labels take precedence and **then** the labels found in config.yml are written to .runner. In the docker compose example the .runner file has no labels the first time around, hence defaults to bullseye. So the second time around the labels from config.yml will be used. I'll file a bug with a reproducer and add a workaround in the example in the meantime to avoid confusing until a release fixing this bug is published.
Owner

On top of that there is also a race in the order of operations in the docker compose file. I think it is wrong to rely on depends_on to ensure the order in which forgejo & registration & daemon start as I've observed it does not behave the way it should in some cases. Making assertions on the existence of a file in the shell script is more straightforward to understand and reliable.

On top of that there is also a race in the order of operations in the docker compose file. I think it is wrong to rely on `depends_on` to ensure the order in which forgejo & registration & daemon start as I've observed it does not behave the way it should in some cases. Making assertions on the existence of a file in the shell script is more straightforward to understand and reliable.
Owner

Steps to reproduce:

  • Set config.yml with labels: ["docker:docker://node:20-bookworm"]
  • Set .runner with labels: null
  • runner --config config.yml daemon
  • Submit a workflow with runs-on: docker
  • See it uses node:16-bullseye(the default associated with the docker label when no labels are set) instead of node:20-bookworm
  • See that .runner is now set with "labels": ["docker:docker://node:20-bookworm"]
  • Stop the runner daemon
  • runner --config config.yml daemon
  • Submit a workflow with runs-on: docker
  • See it uses node:20-bookworm
Steps to reproduce: * Set config.yml with `labels: ["docker:docker://node:20-bookworm"]` * Set .runner with `labels: null` * `runner --config config.yml daemon` * Submit a workflow with `runs-on: docker` * See it uses `node:16-bullseye`(the default associated with the docker label when no labels are set) instead of `node:20-bookworm` * See that .runner is now set with `"labels": ["docker:docker://node:20-bookworm"]` * Stop the runner daemon * `runner --config config.yml daemon` * Submit a workflow with `runs-on: docker` * See it uses `node:20-bookworm`
earl-warren self-assigned this 2024-04-07 15:06:42 +00:00
Owner

Fixed by #176

Fixed by https://code.forgejo.org/forgejo/runner/pulls/176
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: forgejo/runner#174
No description provided.