Cannot connect to the Docker daemon at unix:///var/run/docker.sock. #153
Labels
No labels
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: forgejo/runner#153
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I'm at my wits end here and hopeful someone can assist.
I'm attempting to setup a runner using a k3s cluster that is installed with the
--docker
option on ALL the nodes. If my understanding ofdind
is correct though, that doesn't matter at all. But just pointing it out.I can achieve the following:
forgejo-runner
pod shows that it connects just fine to thedaemon
(docker:dind) pod and stays running.My issue:
To note: I'm only trying to test with building a multiarch docker container per a Dockerfile in my repo.
When I run an action on my repo, I can see the output. in both the
forgejo-runner
pod and the web console on Codeberg, I can see it pulling all the required containers images for the action, and when it gets to thedocker-setup-qemu-action@v3
step it fails with the error:Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Full output
There was a what seemed like promising solution to this on a Gitea issue. But in the end, it didn't work for me. I have also read countless other issues/articles and no good.
My action file
My deployment config
https://code.forgejo.org/forgejo/runner/src/branch/main/examples/docker-compose
There is a tested example which you could use for inspiration, if not already. It is very easy to get confused wheen thinking about docker in docker.
You also note that
container
is not effective / does nothing when trying to use another image. This may simply be an indentation problem and you could verify that from this tested example: https://code.forgejo.org/forgejo/end-to-end/src/branch/main/actions/example-container/.forgejo/workflows/test.ymlDoes that help?
This example I have read through as well. I think the biggest difference I can see here is that if I was to use a compose file to build outside of my kubernetes cluster I would make the containers and link them. Then use
DOCKER_HOST
environment variable to set it to the dind container that is exposing that port. Which I am also doing in the k3s deployment but since it’s the same deployment the variable islocalhost
and not a different pod/container. I could try and deploy each container as a separate deployment though and expose a service. This has some security concerns for me though as currently the pods are self contained and while thedind
pod is exposing its docker API port, it’s not doing so externally of the cluster addresses.. I might try the—tls=false
option with this proposed setup so I don’t have to worry about sharing a volume as well on the pods.This is correct.
container
is not effective and with or without it I can see that the logs output that I’m usingcatthehacker/ubuntu:act-latest
as I have that defined in the—label
option of the forgejo-runner command. I have run a different echo test and that was successful. So I do not believe it’s an indentation nor image issue. Also, if I don’t set the catthehacker flag in my forgejo-runner command I get a different error that the docker executable doesn’t exist. So I’m confident I got the right image.I should note. All my comments on my code blocks were added at the time of this posted issue to try and add some color to the code
After looking at it in more detail I don't see what's wrong. To debug it I woud start by running
docker info
from shell like:It will fail and then I would add commands to investigate more. ps, env ls -l. IIRC
catthehacker/ubuntu:act-latest
should provide everything you need to use docker without worrying about anything.I would remove that maybe. I don't know about k3s and
--docker
but if that has the effect of mounting/var/run/docker.sock
in all containers, it may interfere with what you're trying to do.Haha, yeah, same. I can pull docker images on the
forgejo-runner
pod, so I know it has the docker executable and the checkout step works just fine in the workflow which I would assume is using a container that then runs a git clone.... SoooJust to clarify you are referring to doing this in my workflow file? I did do some tweaking to the workflow file that does some checking to see if Docker is running and sleeps for 300s. Which always fails also.
This is a supported method of installation. Basically instead of using
containerd
on the cluster, its using Docker. Using Docker as the Container Runtime. I don't think this has any bearing, and I can see/var/run/docker.sock
on all 7 k3s nodes. I get no error mounting ashostPath
either. I have been wrong a million times before though.I'd be curious for your input on what I have gathered in my troubleshooting today:
initContainer
. When I do so, it errors saying the--secret
needs to be in hexadecimal form. If I convert my token to hex. Its too long and I then get an error that it needs to be 40 characters and not 80. I am assuming here that this is because I am not self-hosting Forgejo and Codeberg doesn't provide me with a SHA-1 hex value to use as a--secret
instead of--token
that theregister
command uses. But even Codeberg mentions using offline registration in the README, but their kubernetes example doesn't use offline registration.Probably just didn't edit the README when they imported from this repo.forgejo-runner
docs config section aboutdocker-in-docker
. Particularly in thecontainer:
section there is: (See code snippet below) And because I cannot do offline registration, theforgejo-runner daemon
command in my k3s pod (not the initContainer) is using what I assume to be a defaultconfig.yml
. Which has this value asfalse
.This won't work because the steps run in a container and the service named "docker" runs in another container.
Is what I had in mind.
Alright, I think I've narrowed it down.
I don't think it has anything with the action file or Kubernetes per se. I actually think the runner isn't passing
/var/run/docker.sock
to thecatthehacker/ubuntu:act-latest
image. That image by itself does need to have/var/run/docker.sock
bind mounted as a volume or it returns said error from my first post (thedocker info
output). And when the runner spins up using that image it does not have that volume passed to it. I see volume mounts as an option in the config and am currently trying to write ased
command to mount that volume and test.Did it work out?
Update:
I think I have confirmed that its the runner image and it not passing
/var/run/docker.sock
to the nested container. Which may be a limitation of kubernetes or something else entirely.container.volumes:
to- /var/run/docker.sock:/var/run/docker.sock
and I am given a message in the logs statingSo, obviously that didnt help. I tried to make more adjustments, but always ended up the same. Open to suggestions here.
host
instead of containers. So my thinking here was "Well, its already in a pod, I dont need nested containers." So I made some adjustments to my action yaml and myforgejo-runner
command in kubernetes. They are as follows:dockerBuildMultiarch.yaml
Kubernetes deployment
Note: I trimmed a lot out of this for brevity. But the key take aways are
I had to install
docker
andnodejs
via apk on the forgejo-runner pod and create a new docker context. Easy enough to do in my container command. After that I am successfully building multiarch containers in a k3s cluster. I suppose I could just make a dockerfile and build my own forgejo-runner image with all this already done.docker.sock
should still be passing to theforgejo-runner
pod via thedaemon
pod and theDOCKER_HOST
var.This setup might not be ideal for most people and I am too unfamiliar with how the runner works and what implications I might have doing it this way. I am open to any thoughts/adjustments or testing to get proper
dind
working with the runner and kubernetes. But I think this shows that theforgejo-runner
pod is not passingdocker.sock
to the containers it spins up when running the steps. Again, please check my work though.That's a very detailed post-mortem ❤️
Bottom line is ... you got it working?
Yeah, that post ended up longer than I thought.
Yes, its working, but not with default settings per the kubernetes-example.
@rpoovey great to hear. If you have enough fuel left, would you consider a PR to fix the example so that other people do not fall in the same trap as you?
@earl-warren I could for sure write up how to do this with my workaround but it is just that, a workaround. Docker-In-Docker still doesn’t work. I would love to discuss how to address that but I am not sure where that issue stems. My guess is in the
forgejo-runner
container and it’s Dockerfile.Ok, thanks for the clarification. Let's keep this issue open until it can be investigated further. It will require some quality brain time 😄
I've been stumbling a lot on this and I think I finally made this work.
forgejo-runner generate-config > config.yaml
forgejo-runner -c config.yaml daemon
Here it is.
It tested this using this pipeline:
I also tested those actions without any problems:
Hope it may help !
Edit: My kubernetes deployment for the runner:
The deployement does not implement the fix explain above as I manually create and edit the config.yaml directly inside the container after it has been deployed
It is very helpful. What would be fantastic is to run this in the CI. In a way similar to:
It makes a huge difference for the user to know that works for real and there is no tiny roadblock (such as the one you encountered yourself) that will make their experience difficult. And when it fails the developer who introduced a regression knows it right away, it blocks the CI.
Thanks for the intel, I'll try to work on a PR in the incoming days !
I'm actually facing another issue causing runner not respecting provided labels :/
you can see it here: https://code.forgejo.org/forgejo/runner/actions/runs/870#jobstep-4-277
It use
node:16-bullseye
instead ofalpine
as specified inside the compose file heresed -i -e "s|labels: \[\]|labels: \[\"docker:docker://alpine:3.18\"\]|" config.yml ;
It's probably tied to this existing issue : #149
As I need an image that contains docker i'm stuck with this. I'll try to dig on this cause I do not have this problem on my kubernetes deployed forgejo instance.
Good catch. Could you please create a separate issue for this? I think it simply is an issue that the sed does not do what it is supposed to do. I'll work on resolving that this week-end. 👍