1

I have a Node.js micro-service architecture-based back-end. Services communicate with each other via PubSub. E.g. upon a request to service A, service A messages either service B or service C via PubSub. B and C can message other services and so on. What exactly is to happen is for the user to decide. The problem is: In rare, but unavoidable edge-cases, the set-up could result in an endless loop of PubSub messages, like e.g. A->B->C->A and so on. Infinite loops are, of course, bad, bloat the system, and could make it eventually crash.

Now, one way to prevent that would be passing a counter in the PubSub payload, increasing it by one each time a service messages to another service, and then aborting any further processing as soon as a certain threshold would be reached. However, that would mean that the user would not get the result they expect.

So, I was thinking about just adding delays based on the counter value: The more often a certain request is passed around between services, the longer the delay before proceeding, till a certain quite long max delay value of e.g. 5 minutes would be reached.

However, I'm not really sure if that would be a good solution.

  • What would be the risks of that approach?
  • How would I measure if the delay is long enough and how many infinite messages could be passed around without the system breaking down?
1
  • Surely if the loop is infinite, the user won't get the result they want anyway? They'll never get any result because there's an infinite loop! Commented Sep 15, 2021 at 15:12

2 Answers 2

4

I'd challenge the word "unavoidable" - if your messages cause circular activity, then it is quite likely that your model has some fundamental flaw. If you use counters, you need to define a limit above which you consider a long loop infinite, and if it is possible that the system returns a good result after 1001 iterations then obviously a generous limit of 1000 is not enough. Inserting delays would only hide the problem.

So what is the actual semantics of these messages, and why are there circular dependencies?

If you really need dependency loops, you should try to perform some fixpoint analysis to ensure that the loops terminate.

2

First off, the term infinite loop is not really the proper term for this. This is called a 'cycle'. Typically the solution is to error out something that is executing in an endless cycle. The process for determining if that is happening is called 'cycle detection'. This can become a lot worse than you describe. All you need is for one of the steps in the cycle to produce multiple messages and it will swamp your system.

There's a pretty straightforward way to find a cycle: given a list of all the 'places' the message has been, if the next location is found in the list of previous locations, you've got a cycle. This assumes there is no valid reason a transaction would ever be enter the same location twice. For this reason (among others) I would avoid any design where a transaction re-enters the same location.

If we know that a message should never re-enter a node. You can use metadata to the payload which keeps track of the locations that a transaction has been to. Ideally this is something your pub-sub platform would do for you but you might have to build it yourself by adding a routine executed whenever a message is read from a topic/queue that does the following:

  • Check for a metadata header with locations visited. If the current location is found in that list, error.
  • Add the current location to the header

How you handle the error can vary. Often an error topic/queue is used for this.

Not the answer you're looking for? Browse other questions tagged or ask your own question.