42

Say you were to develop a REST API that provides access to a set of complex, long-running, operations.

The typical paradigm for an API like this (as I understand it) usually involves (the client) making a request to the server, asking it to perform a given long-running operation. The server will respond with a 202 Accepted message indicating that the request has been received and, with it, provides the location of the result, where it will eventually become available. The client, from then on, polls this location until the result of the long-running task becomes available.

This much makes sense. However, imagine now that these long-running tasks are more complex. Imagine that, during the execution of a task, a specific resource, file, network, etc. becomes unavailable and, in order to proceed, the API must "ask" the client whether the job should continue anyway or whether the task should end here.

How would this requirement change the original paradigm? Instead of having some result located at the given location, would you optionally return some notion of a "question" that needs to be posted back to the server in order to continue?

Assume for the purposes of this question that you can't encode some kind of blanket "continue if error" parameter in the original request and that these questions must be addressed on a case-by-case basis, as they arise, if they arise.

Maybe I'm thinking about this problem the wrong way? I'd be curious to hear how a paradigm like this is usually accomplished, or if it's as simple as, "yeah, just respond with the prompt, post back the result to the server, and continue to query the original location."

I would really appreciate any help I could get.

4
  • 14
    For something much more complex than start-job/monitor-job-status I wouldn't go with REST. I'd use websockets. If without websockets I'd implement a long-polling (comet) endpoint so I can get real-time (within milliseconds) update from the server.
    – slebetman
    Commented Aug 2, 2020 at 0:30
  • That is an excellent point and it's a direction I'm definitely considering. I think a websocket approach makes a lot of sense given the complexity. However, I wanted to compare it with a REST approach first to better understand how something like this might be implemented. I really like @hans-martin-mosner's approach of modeling jobs as resources. So, I may have the client API initialize the request and return a "handle" to the job via some kind of proxy job object. The job object could then directly communicate with the server over websockets and raise "server-prompt" events when they occur.
    – meci
    Commented Aug 2, 2020 at 3:02
  • 1
    Unfortunately I don't have the rep to answer, but... You could also register a URL for a callback when the processing is done. So, make the initial request, including a callback URL in the request. When the processing is done, the service running the process can hit the callback URL it was given, with the result of the operation. This keeps everything as a REST API, without any setup required for websockets. Not sure how 'pure' REST this is, but it's good enough for Microsoft docs.microsoft.com/en-us/partner-center/develop/…
    – J Lewis
    Commented Aug 3, 2020 at 9:19
  • Yet another possibility would be the use of push notifications. This would even allow the user to close the window and still receive notifications of completion/interaction required.
    – jcaron
    Commented Aug 4, 2020 at 10:52

1 Answer 1

80

For long-running operations, it often helps to model the active job as a REST resource with its own structure and/or sub-resources.

For example, starting a job may return a result such as

202 Accepted
Location: https://example.com/jobs/123

At that URL, the client will get a structure such as

{
  "status":"running"
}

as long as the job is running,

{
  "status":"finished",
  "result":"https://example.com/jobs/123/result"
}

when it is completed and a result is available, or

{
  "status":"interaction-required",
  "prompt":"xyz service not available, please restart it or cancel job.",
  "continue":"https://example.com/jobs/123/continue/<token>",
  "cancel":"https://example.com/jobs/123/cancel"
}

to interact with the user. The job would continue (retrying xyz access) after the client posts something to the continue URL (which would include an idempotency token as suggested by @NPSF3000 to prevent accidentally continuing the next interaction), or would be cancelled by posting something to the cancel URL. Another option for cancellation would be a DELETE verb on the job URL. The cancel link could also be made part of the initial job structure to communicate that the job can be cancelled at any time if the application supports that.

The details about which kinds of interaction are possible and how they are presented in the client would need to be designed based on the specific needs of these jobs, but the main thing is that the operation start does not just return the location of the result but of a reified job object that can be queried and manipulated.

9
  • 4
    Excellent answer, which should get more focus on the opening paragraph's core: modeling your API might require you to have a different model than your business process - you might consider the entities "operation" and "result" as your business operations, but your rest API needs to model "job" as an entity. Commented Aug 1, 2020 at 15:31
  • 2
    More information in How to manage state in REST
    – HenryM
    Commented Aug 1, 2020 at 17:56
  • 1
    @BlueRaja-DannyPflughoeft that might make sense when you want to clean up jobs but keep results available, separately. You could also reuse a previously computed result, in some applications. You might also want to serve job statistics, or use different security policies for the result and the job itself. Commented Aug 2, 2020 at 0:23
  • 5
    Adding some sort of idempotency token to the continue could be a good idea for a production system.
    – NPSF3000
    Commented Aug 2, 2020 at 0:33
  • 1
    This is such a clean, easy and elegant solution to implement. I was expecting something extremelly more complicated. Just one small nitpick: where's the cancel link? One solution I see could be to implement an "actions": { ... } object or similar, where it could be "actions": {"continue": "...", "cancel": " ... "}. In the future, if more actions are required, you can just add them there. And the status could be changed to "action-required". An example of an extra action is asking a file upload for a task that failed to fetch a remote file. Or an option to cancel or retry. Commented Aug 3, 2020 at 10:57

Not the answer you're looking for? Browse other questions tagged or ask your own question.