0

I was reading Google's Architectural overview of Pub/Sub and I was curious how publisher and subscription servers were connected.

From what I understand:

  • When a message is published, it is stored, and then once all subscriptions have acknowledged that message is deleted.
  • subscription servers can use a pull or push model
  • if a client was not connected and "wakes up" hours after a message was published, it can "pull" the publisher server for messages that it missed.

I can imagine that subscription servers are connected to publisher servers via websocket (or some sort of long polling) but if EVERY subscriber server is connected to EVERY publisher for EVERY topic, that's a LOT of connections.

Two questions:

  1. So I'm curious if anyone has insight as to how this is done internally at Google's PubSub?
  2. How do subscription servers know which publisher servers to connect to? There could be a map of topic <-> publishers, but If I send a message now on a publisher and the publisher is killed a few minutes later then the publisher will no longer be in the map.

Thanks in advance

1 Answer 1

3

I think your confusion is that you're thinking of Pub/Sub only as a routing system, connecting publishers directly to subscribers; but it is actually a storage system, persisting the messages until they are read.

See for instance this description in the doc you linked (I've bolded the key phrase):

First, a publisher sends a message on a topic to Pub/Sub. It is encrypted by the proxy layer and sent to a publishing forwarder, a forwarder to which the publisher is connected. In order to ensure delivery, the message is immediately written to storage.

At no point does the publisher talk directly to the subscriber, or vice versa; both talk only to the Pub/Sub service itself. Google calls this intermediary a "forwarder"; in some other messaging systems it is called a "broker"; you could also consider it a specialised form of data store.

So the answers to your questions are:

  1. The number of connections (for a particular topic) is not publishers x subscribers (which would be required if every subscriber needed to poll every publisher), only publishers + subscribers (with each publisher connecting to the service once to publish messages, and each subscriber connecting once to receive subscribed messages)
  2. Subscribers do not need to know which publishers exist, and that is one of the primary reasons to use such an architecture - it makes the architecture loosely coupled.
1
  • For more general information (independent of a specific service provider) you may look up the term "message broker" and read a bit on Kafka, RabbitMQ, Mosquitto and the different levels of functionality provided by them. I suppose the Google service fits somewhere in that list, but haven't had a look at it. Commented Jan 29 at 9:14

Not the answer you're looking for? Browse other questions tagged or ask your own question.