bug: actions do not have access to their own git repositories while executing #1369
Labels
No labels
FreeBSD
Kind/Breaking
Kind/Bug
Kind/Chore
Kind/DependencyUpdate
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
Windows
linux-powerpc64le
linux-riscv64
linux-s390x
run-end-to-end-tests
run-forgejo-tests
run-multi-platform-tests
No milestone
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo/runner#1369
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Can you reproduce the bug on the Forgejo test instance?
Yes
Description
https://v15.next.forgejo.org/srd424/nix-actions-bug/actions/runs/1/jobs/0/attempt/1
Arguably not a bug but a new incompatibility introduced by #1162, but if nothing else a place to discuss workarounds etc would be good.
This PR causes problems with actions written with nix, such as https://github.com/ajamtli/update-flake-inputs-forgejo - the checkout copied into
/var/run/actin the runner environment/namespace has a.gitfile pointing to the original clone in the runner's cache directory(/var/lib/runner/name/.cacheon my system), but the latter directory is not actually accessible.I feel for consistency either the
.gitfile containing the link should be deleted from the temporary copy, or the the original clone directory should be bind-mounted into the runner environment/namespace.Forgejo Version
11.0.10
Runner Version
12.6.3
How are you running Forgejo?
As part of nixpkgs/nixos 25.11 using the built-in module - no container, started by systemd.
How are you running the Runner?
Using the nixos 25.11 package started by the gitea-actions-runner module. Again not in any explicitly added container, started by systemd.
Logs
Relevant snippet from action output - I can grab more debug if it would be helpful? (I think the cause is obvious though given above explanation.)
Workflow file
No response
I'll try to remember to start a discussion on the NixOS forum about this as well, to get a perspective from by 'sides'!
bug:to bug: actions do not have access to their own git repositories while executingI see; so:
uses: https://github.com/ajamtli/update-flake-inputs-forgejo@af10f69549f9e0d216b45e34157035ac300f60ffcauses the runner to clone/fetch this repository into~/.cache/act/af10f69549f9e0d216b45e34157035ac300f60ffspecifically, a worktree is created for that rev.return sar.RunContext.JobContainer.CopyDir(copyToPath, sar.RunContext.Config.Workdir+string(filepath.Separator)+".", sar.RunContext.Config.UseGitIgnore)(ctx).gitfile that is...gitdir: /home/mfenniak/.cache/act/7d/72d8347b1f2b4aa960b234d8d519feb598c5a73b61aede3813cafb6d3d2bce/worktrees/ea260a09568717ead28147ad01b557a879f2d7f247b1de33d01646dd6966ffAnd as a nix flake is a weird and lovely thing that is integrated with the git repo it lives in, a
nix runof a flake within the work tree fails because.gitreferences a repository that doesn't exist. Makes sense that it's a problem.Well, brainstorming some solutions:
.gitfolder.cat .gitanddfwould show the runner's home directory; thecat .gitpart is already true)..gitworktree reference..gitfile and initialize a new repo in that location with all files addedI'll try to hack a quick workaround for the moment, but I think experimenting with the first option is the most appealing to me... until the experimentation shows it's a bad idea for some reason.
It's a total hack job, but
https://github.com/mfenniak/update-flake-inputs-forgejo@b5614c72bdce961524fe3ca2daad7de2d2d8f46fseems to function as a workaround until the issue is resolved. https://github.com/ajamtli/update-flake-inputs-forgejo/compare/main...mfenniak:update-flake-inputs-forgejo:mainI've not tried it in this context yet, but someone nix-adjacent suggested
nix run path:<path>doesn't expect a git repo, and from testing in other contexts that seems to be true. I don't know however how 'brittle' that might be - the whole of the nix flakes concepts and associated commands is technically still experimental, and as you say mostly tied to using actual repos, so I have a worry that depending onpath:might be un-wise long-term.My current work-around has been to create a non-flake
default.nixand invoke usingnix-buildto get a path to run: https://codeberg.org/srd424/update-flake-inputs-forgejoNixos discourse topic: https://discourse.nixos.org/t/forgejo-actions-runner-problem-with-nix-flake-based-actions/75232
what about using git checkout instead of worktree? 🤔
I hacked up a temporary branch where the git bare repo is mounted into the job container, such that I am able to run git commands within the
/var/run/act/actions/...directory successfully. However, it seems that thenixcommands don't support worktrees correctly, even when the repositories are available. For a feature that was released in 2015, that's a little unfortunate.https://github.com/NixOS/nix/issues/6073
This limits options quite a bit. I think that for the runner to support
update-flake-inputs-forgejowould require that the runner has a configuration option to rollback the worktree support and use full repositories. That solution doesn't sound great to me.Oh, that's aggravating .. but somehow not surprising. That issue mentions that lix (a fork partly focused on improving the code quality) works better - I'm happy to try that if you can point me to your test branch? Alternatively https://github.com/joschi/forgejo-runner-nix-containers has a lix based image as well.
If lix works I'd happily consider that fully fixed, and it might incentivize the cppnix people to fix their implementation. It's bed time here in the UK but I will try to pick this up tomorrow!
Here's the POC branch: https://code.forgejo.org/forgejo/runner/compare/main...mfenniak:poc-worktree-bind
(not production safe, doesn't sanitize binding to the job container anymore)
Am I overtired or is there a chunk missing - I can't see where supplementalContent is actually used?
Whoops, missed committing that section. Added now in run_context.go @ prepareJobContainer.
Finally got this built (flake here for anyone following along; binary should be in the garnix.io cache) - it looks very promising with lix!
The nixos module by default tries to harden the runner by using systemd's StateDirectory + and detached namespaces - this caused a couple of problems:
/var/lib/private/gitea-runner, symlinked to/var/lib/gitea-runner; only the latter gets bind-mounted in, but the.gitfile refers to the former - I wonder if it's possible to do the golang equivalent ofrealpathsomewhere?Anyway, I fixed those issues up manually using sshx to break into the workflow, and
nix run(using lix) seemed to be quite happy, so I think you're on the right track - thank you very much! I have no golang skills but I might try to bodge in fixes for the above two issues myself, unless you beat me to it. Definitely a tomorrow job though.Looks like docker still doesn't actually support idmapped binds? But modern podman appears to (not yet tested), and given this is quite niche, requiring podman doesn't seem terrible.
Gah, I bodged in an ",idmap" to the appropriate binds, butgithub.com/docker/cli@7b93d61673/internal/volumespec/volumespec.go (L86)seems to filter it out :( Given my lack of golang skill, I may be hitting my limits here :(Red herring I think; looks like that code only gets used for validation, and doesn't error on unknown bind options.
On NixOS, looks like we need to specify a full idmap option with an explicit user mapping, to match what systemd is doing with
DynamicUser+StateDirectory+ idmapping:idmap=uids=65534-0-1;gids=65534-0-1- this seems to work bodged directly into the json in the crun bundle in/run/crun;I've not tried passing it through from the actions runner yet(Edit: this does work done from code, too.) Makes me more convinced that just using an environment variable to set this is probably sane - I can see no reason why the actions runner needs to be complicated by logic that's dictated by the calling environment, it just needs to provide a simple mechanism to pass the option value through?mfenniak/forgejo-runner#1 seems to do the job here. What changes did you think needed to be made to
sanitizeConfig()?@srd424 I've turned it into a PR so I can comment on it -- #1379/files (comment). I'd be able to follow-up and try to finish this in a few days, but if you're interested in pushing it forward and can fix that remaining functional issue, please create a PR into my branch again. Then all that will be left is a little cleanup and some test automation (which I can take care of as well, time permitting).