bug: actions do not have access to their own git repositories while executing #1369

Open
opened 2026-02-11 18:38:26 +00:00 by srd424 · 17 comments

Can you reproduce the bug on the Forgejo test instance?

Yes

Description

https://v15.next.forgejo.org/srd424/nix-actions-bug/actions/runs/1/jobs/0/attempt/1

Arguably not a bug but a new incompatibility introduced by #1162, but if nothing else a place to discuss workarounds etc would be good.

This PR causes problems with actions written with nix, such as https://github.com/ajamtli/update-flake-inputs-forgejo - the checkout copied into /var/run/act in the runner environment/namespace has a .git file pointing to the original clone in the runner's cache directory (/var/lib/runner/name/.cache on my system), but the latter directory is not actually accessible.

I feel for consistency either the .git file containing the link should be deleted from the temporary copy, or the the original clone directory should be bind-mounted into the runner environment/namespace.

Forgejo Version

11.0.10

Runner Version

12.6.3

How are you running Forgejo?

As part of nixpkgs/nixos 25.11 using the built-in module - no container, started by systemd.

How are you running the Runner?

Using the nixos 25.11 package started by the gitea-actions-runner module. Again not in any explicitly added container, started by systemd.

Logs

Relevant snippet from action output - I can grab more debug if it would be helpful? (I think the cause is obvious though given above explanation.)

      … while fetching the input 'git+file:///var/run/act/actions/7f/884094afe030e6214810c7f881f557654c6b767f978ea3f09fddb95b766cab'
       error: opening Git repository "/var/run/act/actions/7f/884094afe030e6214810c7f881f557654c6b767f978ea3f09fddb95b766cab": failed to resolve path '/var/lib/private/gitea-runner/nuc1/.cache/act/7d/72d8347b1f2b4aa960b234d8d519feb598c5a73b61aede3813cafb6d3d2bce/worktrees/884094afe030e6214810c7f881f557654c6b767f978ea3f09fddb95b766cab': No such file or directory
   

Workflow file

No response

### Can you reproduce the bug on the Forgejo test instance? Yes ### Description https://v15.next.forgejo.org/srd424/nix-actions-bug/actions/runs/1/jobs/0/attempt/1 Arguably not a bug but a new incompatibility introduced by https://code.forgejo.org/forgejo/runner/pulls/1162, but if nothing else a place to discuss workarounds etc would be good. This PR causes problems with actions written with nix, such as https://github.com/ajamtli/update-flake-inputs-forgejo - the checkout copied into `/var/run/act` in the runner environment/namespace has a `.git` file pointing to the original clone in the runner's cache directory `(/var/lib/runner/name/.cache` on my system), but the latter directory is not actually accessible. I feel for consistency either the `.git` file containing the link should be deleted from the temporary copy, or the the original clone directory should be bind-mounted into the runner environment/namespace. ### Forgejo Version 11.0.10 ### Runner Version 12.6.3 ### How are you running Forgejo? As [part of nixpkgs/nixos 25.11](https://search.nixos.org/packages?channel=25.11&show=forgejo-lts&query=forgejo) using the [built-in module](https://github.com/NixOS/nixpkgs/blob/nixos-25.11/nixos/modules/services/misc/forgejo.nix) - no container, started by systemd. ### How are you running the Runner? Using the [nixos 25.11 package](https://search.nixos.org/packages?channel=25.11&show=forgejo-runner&query=forgejo) started by the [gitea-actions-runner module](https://github.com/NixOS/nixpkgs/blob/nixos-25.11/nixos/modules/services/continuous-integration/gitea-actions-runner.nix). Again not in any explicitly added container, started by systemd. ### Logs Relevant snippet from action output - I can grab more debug if it would be helpful? (I think the cause is obvious though given above explanation.) ``` … while fetching the input 'git+file:///var/run/act/actions/7f/884094afe030e6214810c7f881f557654c6b767f978ea3f09fddb95b766cab' error: opening Git repository "/var/run/act/actions/7f/884094afe030e6214810c7f881f557654c6b767f978ea3f09fddb95b766cab": failed to resolve path '/var/lib/private/gitea-runner/nuc1/.cache/act/7d/72d8347b1f2b4aa960b234d8d519feb598c5a73b61aede3813cafb6d3d2bce/worktrees/884094afe030e6214810c7f881f557654c6b767f978ea3f09fddb95b766cab': No such file or directory ``` ### Workflow file _No response_
Author

I'll try to remember to start a discussion on the NixOS forum about this as well, to get a perspective from by 'sides'!

I'll try to remember to start a discussion on the NixOS forum about this as well, to get a perspective from by 'sides'!
mfenniak changed title from bug: to bug: actions do not have access to their own git repositories while executing 2026-02-11 19:29:32 +00:00
Owner

I see; so:

  • uses: https://github.com/ajamtli/update-flake-inputs-forgejo@af10f69549f9e0d216b45e34157035ac300f60ff causes the runner to clone/fetch this repository into ~/.cache/act/
  • On order to read the contents of af10f69549f9e0d216b45e34157035ac300f60ff specifically, a worktree is created for that rev.
  • When executing the action, the work tree is copied into the container (
    return sar.RunContext.JobContainer.CopyDir(copyToPath, sar.RunContext.Config.Workdir+string(filepath.Separator)+".", sar.RunContext.Config.UseGitIgnore)(ctx)
    )
  • The work tree that is copied into the container doesn't have a git repo, but instead has a .git file that is...
    • gitdir: /home/mfenniak/.cache/act/7d/72d8347b1f2b4aa960b234d8d519feb598c5a73b61aede3813cafb6d3d2bce/worktrees/ea260a09568717ead28147ad01b557a879f2d7f247b1de33d01646dd6966ff
    • (reference to the runner's home directory cache)
  • Therefore git operations that the action performs on its own repo fail (I've updated the PR title to reflect this).

And as a nix flake is a weird and lovely thing that is integrated with the git repo it lives in, a nix run of a flake within the work tree fails because .git references a repository that doesn't exist. Makes sense that it's a problem.

Well, brainstorming some solutions:

  • Mount the bare repo into the job container, as a read-only volume, at it's expected location defined by the .git folder.
    • Specific to container executors, but I think host-based executors likely won't have quite the same problem.
    • Would allow access to the repo and its history, but being RO would mean that it can't be messed up for other cache users.
    • This would be a mess from a software architecture perspective as it would require each step that is invoked to be able to modify the job container before it is started, but, I think it's possible.
    • Exposes a bit of the host environment into the job container (eg. cat .git and df would show the runner's home directory; the cat .git part is already true).
  • Copy the bare repo into the job container, either at the same location original location, or rewriting the .git worktree reference.
    • Similar benefits to mounting, but allows write access that is then thrown away at the end of a job.
    • Requires more disk space and time.
  • Remove the .git file and initialize a new repo in that location with all files added
    • Solves the specific nix flake problem, but otherwise seems like a hack with no logical reason behind it.
  • Stop using git worktree, and instead do something like a copy of the entire repo in the cache when preparing an action
    • Seems like a step backwards, and isn't any better than the container-specific solutions to get the repo in-place -- the runner's cache isn't where we need this data anyway, so we'd just be populating it to copy it later.

I'll try to hack a quick workaround for the moment, but I think experimenting with the first option is the most appealing to me... until the experimentation shows it's a bad idea for some reason.

I see; so: - `uses: https://github.com/ajamtli/update-flake-inputs-forgejo@af10f69549f9e0d216b45e34157035ac300f60ff` causes the runner to clone/fetch this repository into `~/.cache/act/` - On order to read the contents of `af10f69549f9e0d216b45e34157035ac300f60ff` specifically, a worktree is created for that rev. - When executing the action, the work tree is copied into the container (https://code.forgejo.org/forgejo/runner/src/commit/07e2e13cf5d34314f94ad243fb75d632d3b80e54/act/runner/step_action_remote.go#L136) - The work tree that is copied into the container doesn't have a git repo, but instead has a `.git` file that is... - `gitdir: /home/mfenniak/.cache/act/7d/72d8347b1f2b4aa960b234d8d519feb598c5a73b61aede3813cafb6d3d2bce/worktrees/ea260a09568717ead28147ad01b557a879f2d7f247b1de33d01646dd6966ff` - (reference to the runner's home directory cache) - Therefore git operations that the action performs on its own repo fail (I've updated the PR title to reflect this). And as a nix flake is a weird and lovely thing that is integrated with the git repo it lives in, a `nix run` of a flake within the work tree fails because `.git` references a repository that doesn't exist. Makes sense that it's a problem. Well, brainstorming some solutions: - Mount the bare repo into the job container, as a read-only volume, at it's expected location defined by the `.git` folder. - Specific to container executors, but I think host-based executors likely won't have quite the same problem. - Would allow access to the repo and its history, but being RO would mean that it can't be messed up for other cache users. - This would be a mess from a software architecture perspective as it would require each step that is invoked to be able to modify the job container before it is started, but, I think it's possible. - Exposes a bit of the host environment into the job container (eg. `cat .git` and `df` would show the runner's home directory; the `cat .git` part is already true). - Copy the bare repo into the job container, either at the same location original location, or rewriting the `.git` worktree reference. - Similar benefits to mounting, but allows write access that is then thrown away at the end of a job. - Requires more disk space and time. - Remove the `.git` file and initialize a new repo in that location with all files added - Solves the specific nix flake problem, but otherwise seems like a hack with no logical reason behind it. - Stop using git worktree, and instead do something like a copy of the entire repo in the cache when preparing an action - Seems like a step backwards, and isn't any better than the container-specific solutions to get the repo in-place -- the runner's cache isn't where we need this data anyway, so we'd just be populating it to copy it later. I'll try to hack a quick workaround for the moment, but I think experimenting with the first option is the most appealing to me... until the experimentation shows it's a bad idea for some reason.
Owner

It's a total hack job, but https://github.com/mfenniak/update-flake-inputs-forgejo@b5614c72bdce961524fe3ca2daad7de2d2d8f46f seems to function as a workaround until the issue is resolved. https://github.com/ajamtli/update-flake-inputs-forgejo/compare/main...mfenniak:update-flake-inputs-forgejo:main

It's a total hack job, but `https://github.com/mfenniak/update-flake-inputs-forgejo@b5614c72bdce961524fe3ca2daad7de2d2d8f46f` seems to function as a workaround until the issue is resolved. https://github.com/ajamtli/update-flake-inputs-forgejo/compare/main...mfenniak:update-flake-inputs-forgejo:main
Author

I've not tried it in this context yet, but someone nix-adjacent suggested nix run path:<path> doesn't expect a git repo, and from testing in other contexts that seems to be true. I don't know however how 'brittle' that might be - the whole of the nix flakes concepts and associated commands is technically still experimental, and as you say mostly tied to using actual repos, so I have a worry that depending on path: might be un-wise long-term.

My current work-around has been to create a non-flake default.nix and invoke using nix-build to get a path to run: https://codeberg.org/srd424/update-flake-inputs-forgejo

I've not tried it in this context yet, but someone nix-adjacent suggested `nix run path:<path>` doesn't expect a git repo, and from testing in other contexts that seems to be true. I don't know however how 'brittle' that might be - the whole of the nix flakes concepts and associated commands is technically still experimental, and as you say mostly tied to using actual repos, so I have a worry that depending on `path:` might be un-wise long-term. My current work-around has been to create a non-flake `default.nix` and invoke using `nix-build` to get a path to run: https://codeberg.org/srd424/update-flake-inputs-forgejo
Author
Nixos discourse topic: https://discourse.nixos.org/t/forgejo-actions-runner-problem-with-nix-flake-based-actions/75232
Owner

what about using git checkout instead of worktree? 🤔

what about using git checkout instead of worktree? 🤔
Owner

I hacked up a temporary branch where the git bare repo is mounted into the job container, such that I am able to run git commands within the /var/run/act/actions/... directory successfully. However, it seems that the nix commands don't support worktrees correctly, even when the repositories are available. For a feature that was released in 2015, that's a little unfortunate.

https://github.com/NixOS/nix/issues/6073

This limits options quite a bit. I think that for the runner to support update-flake-inputs-forgejo would require that the runner has a configuration option to rollback the worktree support and use full repositories. That solution doesn't sound great to me.

I hacked up a temporary branch where the git bare repo is mounted into the job container, such that I am able to run git commands within the `/var/run/act/actions/...` directory successfully. However, it seems that the `nix` commands don't support worktrees correctly, even when the repositories are available. For a feature that was released in 2015, that's a little unfortunate. https://github.com/NixOS/nix/issues/6073 This limits options quite a bit. I think that for the runner to support `update-flake-inputs-forgejo` would require that the runner has a configuration option to rollback the worktree support and use full repositories. That solution doesn't sound great to me.
Author

Oh, that's aggravating .. but somehow not surprising. That issue mentions that lix (a fork partly focused on improving the code quality) works better - I'm happy to try that if you can point me to your test branch? Alternatively https://github.com/joschi/forgejo-runner-nix-containers has a lix based image as well.

If lix works I'd happily consider that fully fixed, and it might incentivize the cppnix people to fix their implementation. It's bed time here in the UK but I will try to pick this up tomorrow!

Oh, that's aggravating .. but somehow not surprising. That issue mentions that lix (a fork partly focused on improving the code quality) works better - I'm happy to try that if you can point me to your test branch? Alternatively https://github.com/joschi/forgejo-runner-nix-containers has a lix based image as well. If lix works I'd happily consider that fully fixed, and it might incentivize the cppnix people to fix their implementation. It's bed time here in the UK but I will try to pick this up tomorrow!
Owner

Here's the POC branch: https://code.forgejo.org/forgejo/runner/compare/main...mfenniak:poc-worktree-bind

(not production safe, doesn't sanitize binding to the job container anymore)

Here's the POC branch: https://code.forgejo.org/forgejo/runner/compare/main...mfenniak:poc-worktree-bind (not production safe, doesn't sanitize binding to the job container anymore)
Author

Am I overtired or is there a chunk missing - I can't see where supplementalContent is actually used?

Am I overtired or is there a chunk missing - I can't see where supplementalContent is actually used?
Owner

Whoops, missed committing that section. Added now in run_context.go @ prepareJobContainer.

Whoops, missed committing that section. Added now in run_context.go @ prepareJobContainer.
Author

Finally got this built (flake here for anyone following along; binary should be in the garnix.io cache) - it looks very promising with lix!

The nixos module by default tries to harden the runner by using systemd's StateDirectory + and detached namespaces - this caused a couple of problems:

  • the real home directory for the runner is /var/lib/private/gitea-runner, symlinked to /var/lib/gitea-runner; only the latter gets bind-mounted in, but the .git file refers to the former - I wonder if it's possible to do the golang equivalent of realpath somewhere?
  • the bind mounted directory doesn't get idmapped, so git whinges about the ownership mismatch. Adding idmap to the bind options by default may not be a good idea though, because I'm not sure all filesystems support it yet. Maybe an environment variable for extra options would be the quick and dirty solution, given this is quite a niche case? Would save adding stuff to the config file and passing it down into the guts of the code.

Anyway, I fixed those issues up manually using sshx to break into the workflow, and nix run (using lix) seemed to be quite happy, so I think you're on the right track - thank you very much! I have no golang skills but I might try to bodge in fixes for the above two issues myself, unless you beat me to it. Definitely a tomorrow job though.

Finally got this built (flake [here](https://codeberg.org/srd424/update-flake-inputs-forgejo) for anyone following along; binary should be in the garnix.io cache) - it looks very promising with lix! The nixos module by default tries to harden the runner by using systemd's StateDirectory + and detached namespaces - this caused a couple of problems: - the real home directory for the runner is `/var/lib/private/gitea-runner`, symlinked to `/var/lib/gitea-runner`; only the latter gets bind-mounted in, but the `.git` file refers to the former - I wonder if it's possible to do the golang equivalent of `realpath` somewhere? - the bind mounted directory doesn't get idmapped, so git whinges about the ownership mismatch. Adding idmap to the bind options by default may not be a good idea though, because I'm not sure all filesystems support it yet. Maybe an environment variable for extra options would be the quick and dirty solution, given this is quite a niche case? Would save adding stuff to the config file and passing it down into the guts of the code. Anyway, I fixed those issues up manually using sshx to break into the workflow, and `nix run` (using lix) seemed to be quite happy, so I think you're on the right track - thank you very much! I have no golang skills but I might try to bodge in fixes for the above two issues myself, unless you beat me to it. Definitely a tomorrow job though.
Author

Looks like docker still doesn't actually support idmapped binds? But modern podman appears to (not yet tested), and given this is quite niche, requiring podman doesn't seem terrible.

Looks like docker still doesn't actually support idmapped binds? But modern podman appears to (not yet tested), and given this is quite niche, requiring podman doesn't seem terrible.
Author

Gah, I bodged in an ",idmap" to the appropriate binds, but github.com/docker/cli@7b93d61673/internal/volumespec/volumespec.go (L86) seems to filter it out :( Given my lack of golang skill, I may be hitting my limits here :(

~Gah, I bodged in an ",idmap" to the appropriate binds, but https://github.com/docker/cli/blob/7b93d616736a5c44b8970614ce402d0bd77af8d3/internal/volumespec/volumespec.go#L86 seems to filter it out :( Given my lack of golang skill, I may be hitting my limits here :(~
Author

Red herring I think; looks like that code only gets used for validation, and doesn't error on unknown bind options.

On NixOS, looks like we need to specify a full idmap option with an explicit user mapping, to match what systemd is doing with DynamicUser + StateDirectory + idmapping: idmap=uids=65534-0-1;gids=65534-0-1 - this seems to work bodged directly into the json in the crun bundle in /run/crun; I've not tried passing it through from the actions runner yet (Edit: this does work done from code, too.) Makes me more convinced that just using an environment variable to set this is probably sane - I can see no reason why the actions runner needs to be complicated by logic that's dictated by the calling environment, it just needs to provide a simple mechanism to pass the option value through?

Red herring I think; looks like that code only gets used for validation, and doesn't error on unknown bind options. On NixOS, looks like we need to specify a full idmap option with an explicit user mapping, to match what systemd is doing with `DynamicUser` + `StateDirectory` + idmapping: `idmap=uids=65534-0-1;gids=65534-0-1` - this seems to work bodged directly into the json in the crun bundle in `/run/crun`; ~I've not tried passing it through from the actions runner yet~ (**Edit:** this does work done from code, too.) Makes me more convinced that just using an environment variable to set this is probably sane - I can see no reason why the actions runner needs to be complicated by logic that's dictated by the calling environment, it just needs to provide a simple mechanism to pass the option value through?
Author

mfenniak/forgejo-runner#1 seems to do the job here. What changes did you think needed to be made to sanitizeConfig()?

https://code.forgejo.org/mfenniak/forgejo-runner/pulls/1 seems to do the job here. What changes did you think needed to be made to `sanitizeConfig()`?
Owner

@srd424 I've turned it into a PR so I can comment on it -- #1379/files (comment). I'd be able to follow-up and try to finish this in a few days, but if you're interested in pushing it forward and can fix that remaining functional issue, please create a PR into my branch again. Then all that will be left is a little cleanup and some test automation (which I can take care of as well, time permitting).

@srd424 I've turned it into a PR so I can comment on it -- https://code.forgejo.org/forgejo/runner/pulls/1379/files#issuecomment-78090. I'd be able to follow-up and try to finish this in a few days, but if you're interested in pushing it forward and can fix that remaining functional issue, please create a PR into my branch again. Then all that will be left is a little cleanup and some test automation (which I can take care of as well, time permitting).
Sign in to join this conversation.
No milestone
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/runner#1369
No description provided.