fix: implement idempotent FetchTask API calls to reduce risk of lost tasks #1393

Merged
mfenniak merged 1 commit from mfenniak/forgejo-runner:fetchtask-idempotent into main 2026-02-22 22:24:44 +00:00
Owner

When the FetchTask() API is invoked to create a task, unpreventable environmental errors may occur; for example, network disconnects and timeouts. It's possible that these errors occur after the server-side has assigned a task to the runner during the API call, in which case the error would cause that task to be lost between the two systems -- the server will think it's assigned to the runner, and the runner never received it. This can cause jobs to appear stuck at "Set up job" (#1391).

The solution implemented here is idempotency in the FetchTask() API call, which means that the "same" FetchTask() API call is expected to return the same values. Specifically, the runner creates a unique identifier requestKey which is transmitted to the server along with each FetchTask() invocation which defines the sameness of the call, and the runner retains the requestKey value until the API call receives a successful response. If the server implements idempotency, it can use this key to identify repeated invocations of FetchTask() and when the same request is received, the same response is provided.

Runner's responsibility is to send the same request key consistently if any error occurred, and, change it to a new key when a successful call is received.

A separate Forgejo PR will be coming to implement the server-side portion of this, and I expect to hold this PR until the server-side is reviewed to ensure the system design doesn't change during that review and invalidate any work here.

  • bug fixes
    • PR: fix: implement idempotent FetchTask API calls to reduce risk of lost tasks
When the FetchTask() API is invoked to create a task, unpreventable environmental errors may occur; for example, network disconnects and timeouts. It's possible that these errors occur after the server-side has assigned a task to the runner during the API call, in which case the error would cause that task to be lost between the two systems -- the server will think it's assigned to the runner, and the runner never received it. This can cause jobs to appear stuck at "Set up job" (#1391). The solution implemented here is idempotency in the FetchTask() API call, which means that the "same" FetchTask() API call is expected to return the same values. Specifically, the runner creates a unique identifier `requestKey` which is transmitted to the server along with each FetchTask() invocation which defines the sameness of the call, and the runner retains the `requestKey` value until the API call receives a successful response. If the server implements idempotency, it can use this key to identify repeated invocations of FetchTask() and when the same request is received, the same response is provided. Runner's responsibility is to send the same request key consistently if any error occurred, and, change it to a new key when a successful call is received. A separate Forgejo PR will be coming to implement the server-side portion of this, and I expect to hold this PR until the server-side is reviewed to ensure the system design doesn't change during that review and invalidate any work here. <!--start release-notes-assistant--> <!--URL:https://code.forgejo.org/forgejo/runner--> - bug fixes - [PR](https://code.forgejo.org/forgejo/runner/pulls/1393): <!--number 1393 --><!--line 0 --><!--description Zml4OiBpbXBsZW1lbnQgaWRlbXBvdGVudCBGZXRjaFRhc2sgQVBJIGNhbGxzIHRvIHJlZHVjZSByaXNrIG9mIGxvc3QgdGFza3M=-->fix: implement idempotent FetchTask API calls to reduce risk of lost tasks<!--description--> <!--end release-notes-assistant-->
mfenniak force-pushed fetchtask-idempotent from cf9dba7ba5
Some checks failed
issue-labels / release-notes (pull_request_target) Successful in 5s
checks / Build Forgejo Runner (pull_request) Failing after 45s
checks / Run integration tests with Docker (docker-latest) (pull_request) Has been skipped
checks / Build unsupported platforms (pull_request) Has been skipped
checks / Run integration tests with Docker (docker-stable) (pull_request) Has been skipped
checks / Run integration tests with Podman (pull_request) Has been skipped
checks / runner exec tests (pull_request) Has been skipped
checks / validate pre-commit-hooks file (pull_request) Successful in 57s
checks / validate mocks (pull_request) Successful in 1m19s
to 7b3facef73
Some checks failed
cascade / forgejo (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Successful in 7s
checks / validate pre-commit-hooks file (pull_request) Successful in 1m14s
checks / Build Forgejo Runner (pull_request) Successful in 1m28s
checks / validate mocks (pull_request) Successful in 1m31s
checks / Build unsupported platforms (pull_request) Successful in 35s
checks / runner exec tests (pull_request) Successful in 56s
checks / Run integration tests with Docker (docker-latest) (pull_request) Failing after 11m3s
checks / Run integration tests with Docker (docker-stable) (pull_request) Failing after 13m58s
checks / Run integration tests with Podman (pull_request) Failing after 18m16s
2026-02-20 17:33:28 +00:00
Compare
mfenniak force-pushed fetchtask-idempotent from 7b3facef73
Some checks failed
cascade / forgejo (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Successful in 7s
checks / validate pre-commit-hooks file (pull_request) Successful in 1m14s
checks / Build Forgejo Runner (pull_request) Successful in 1m28s
checks / validate mocks (pull_request) Successful in 1m31s
checks / Build unsupported platforms (pull_request) Successful in 35s
checks / runner exec tests (pull_request) Successful in 56s
checks / Run integration tests with Docker (docker-latest) (pull_request) Failing after 11m3s
checks / Run integration tests with Docker (docker-stable) (pull_request) Failing after 13m58s
checks / Run integration tests with Podman (pull_request) Failing after 18m16s
to a57a4664e3
All checks were successful
checks / validate pre-commit-hooks file (pull_request) Successful in 45s
checks / Build Forgejo Runner (pull_request) Successful in 54s
checks / validate mocks (pull_request) Successful in 58s
checks / Build unsupported platforms (pull_request) Successful in 20s
checks / runner exec tests (pull_request) Successful in 40s
checks / Run integration tests with Docker (docker-latest) (pull_request) Successful in 11m31s
checks / Run integration tests with Docker (docker-stable) (pull_request) Successful in 14m40s
checks / Run integration tests with Podman (pull_request) Successful in 18m4s
issue-labels / release-notes (pull_request_target) Successful in 5s
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Successful in 5s
cascade / forgejo (pull_request_target) Successful in 1m43s
2026-02-20 18:28:10 +00:00
Compare
aahlenst approved these changes 2026-02-20 20:51:51 +00:00
Contributor

cascading-pr updated at actions/setup-forgejo#903

cascading-pr updated at https://code.forgejo.org/actions/setup-forgejo/pulls/903
mfenniak deleted branch fetchtask-idempotent 2026-02-22 22:24:44 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/runner!1393
No description provided.