bug: job timeouts issued from the Forgejo instance are not reliable #980

Closed
opened 2025-09-10 22:57:31 +00:00 by alexrp · 2 comments
Contributor

Can you reproduce the bug on the Forgejo test instance?

No

Description

We have observed that when a job is timed out by the Forgejo instance (in this case Codeberg) due to its global job timeout, the job is marked as a failure rather than as a timeout. But more problematically, the job actually keeps running on the machine in spite of this (which could potentially be forever due to #979).

Some context here: https://codeberg.org/actions/meta/issues/46

Relevant run: https://codeberg.org/ziglang/zig/actions/runs/5

Forgejo Version

Codeberg current (v12.x?)

Runner Version

v11.0.0

How are you running Forgejo?

Codeberg

How are you running the Runner?

Built from source with make build but otherwise following the installation guide.

Logs

Debug logging was not enabled on the run linked above; I've kicked off another run with debug logging and will attach logs once it completes.

Workflow file

codeberg.org/ziglang/zig@a16afaf76f/.forgejo/workflows/ci.yaml

### Can you reproduce the bug on the Forgejo test instance? No ### Description We have observed that when a job is timed out by the Forgejo instance (in this case Codeberg) due to its global job timeout, the job is marked as a failure rather than as a timeout. But more problematically, the job actually keeps running on the machine in spite of this (which could potentially be forever due to #979). Some context here: https://codeberg.org/actions/meta/issues/46 Relevant run: https://codeberg.org/ziglang/zig/actions/runs/5 ### Forgejo Version Codeberg current (v12.x?) ### Runner Version v11.0.0 ### How are you running Forgejo? Codeberg ### How are you running the Runner? Built from source with `make build` but otherwise following the installation guide. ### Logs Debug logging was not enabled on the run linked above; I've kicked off another run with debug logging and will attach logs once it completes. ### Workflow file https://codeberg.org/ziglang/zig/src/commit/a16afaf76fec0d6897c07afcc0dd7f0eb1fbf1c7/.forgejo/workflows/ci.yaml
Contributor

When a step of a job runs, it will upload logs to the Forgejo instance:

resp, err := r.client.UpdateLog(r.ctx, connect.NewRequest(&runnerv1.UpdateLogRequest{
TaskId: r.state.Id,
Index: int64(r.logOffset),
Rows: rows,
NoMore: noMore,
}))
if err != nil {
return err
}
ack := int(resp.Msg.GetAckIndex())
if ack < r.logOffset {
return fmt.Errorf("submitted logs are lost %d < %d", ack, r.logOffset)
}

and the forgejo server answers with

message UpdateLogResponse {
int64 ack_index = 1; // If all lines are received, should be index + length(lines).
}

which has no indication that the job was stopped by Forgejo.

When the step is completed by the runner, it will upload the state to the Forgejo instance and be notified if the job was canceled.

if resp.Msg.GetState().GetResult() == runnerv1.Result_RESULT_CANCELLED {
r.cancel()
}

When a step of a job runs, it will upload logs to the Forgejo instance: https://code.forgejo.org/forgejo/runner/src/commit/3dd167d7701bebf86fb694dbeeeda8f92454554e/internal/pkg/report/reporter.go#L328-L342 and the forgejo server answers with https://code.forgejo.org/forgejo/actions-proto/src/commit/d1777763da1612ff70dfb49b42154934ae0ac783/proto/runner/v1/messages.proto#L57-L60 which has no indication that the job was stopped by Forgejo. When the step is completed by the runner, it will upload the state to the Forgejo instance and be notified if the job was canceled. https://code.forgejo.org/forgejo/runner/src/commit/3dd167d7701bebf86fb694dbeeeda8f92454554e/internal/pkg/report/reporter.go#L383-L386
Contributor

Forgejo side, a task runs every 30 minutes, looking for jobs that exceeds the timeout defined in app.ini by ENDLESS_TASK_TIMEOUT

func registerStopEndlessTasks() {
RegisterTaskFatal("stop_endless_tasks", &BaseConfig{
Enabled: true,
RunAtStart: true,
Schedule: "@every 30m",
}, func(ctx context.Context, _ *user_model.User, cfg Config) error {
return actions_service.StopEndlessTasks(ctx)
})
}

Forgejo side, a task runs every 30 minutes, looking for jobs that exceeds the timeout defined in app.ini by [ENDLESS_TASK_TIMEOUT](https://forgejo.org/docs/next/admin/config-cheat-sheet/#actions-actions) https://code.forgejo.org/forgejo/forgejo/src/commit/32ca0f5d63e4918648a083b46312f4a25ba493d4/services/cron/tasks_actions.go#L37-L45
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/runner#980
No description provided.