8

I'm developing a realtime chat application with an Angular frontend and Java backend. I've found a couple of examples that resemble what I am trying to achieve, such as:

It seems common to include Kafka as a message broker, but I am trying to understand why we would want it. In my messaging application I don't foresee the server publishing anything towards the users. It will always be an end user on the clientside publishing something to a websocket endpoint, the server storing that information in the database and then delegating the message to the correct recipient(s), like so:

Architecture

Now let's try to think away the Kafka section in that diagram and persist the message directly in the database. What do I lose out on except that the message gets stored in the database synchronously instead of asynchronously? I can simply put the message on /queue/message before persisting the message so that there are no latency issues between the users. In the examples I've seen, persistence was not really a part of the flow so I figured there may be another reason to using Kafka.

5
  • 1
    A "I don't foresee the server publishing anything towards the users" B "the server [..] delegating the message to the correct recipient(s)" How are you going to do B, if not by doing A?
    – Flater
    Commented Feb 12, 2021 at 14:21
  • Perhaps I worded it badly, but I meant that the server won't be publishing events of its own. Clients will publish events, which pass through the server in the form of subscribe and republish.
    – Babyburger
    Commented Feb 12, 2021 at 14:39
  • 1
    What happens when your websocket to B is broken, and then reconnects? Kafka will help with replaying all those messages from A that happened when the socket was down.
    – Neil
    Commented Feb 12, 2021 at 15:12
  • @Neil That's a good point. In that scenario we wouldn't put the message directly on the websocket to user B, but have the Kafka listener put it on /queue/message then? And in the case of failure, put it back in the Kafka queue.
    – Babyburger
    Commented Feb 12, 2021 at 15:19
  • It's been a few years since I did any Kafka, but I don't think you would put anything directly on a web socket, you push the message to Kafka, and let it decide if/how/when to push it to the client(s). The thing about sockets, is you don't know when they have failed, how much got through.
    – Neil
    Commented Feb 12, 2021 at 15:22

2 Answers 2

8

People think of kafka as a message broker, but it's also sort of a database that stores and retrieves messages in order, and tracks every consumer's place in that list. This is an extremely nice fit to store something like chat messages, especially if the websocket box on your diagram is actually multiple websockets on multiple machines, where the bottom putMessage arrow in that box goes through kafka as well.

In a typical kafka application, the DB in your diagram would be more for aggregate purposes like search or analytics, not for queries like "get me the new messages since the last time I polled."

12

If you have 1 server doing everything, there's really no reason to use Kafka at all.

If you distribute, which you probably have to if you want to scale to millions of users, other aspects come in. Clients who speak may not speak to the node where the recipient is. In this case, you'll have to route messages and/or do a database polling on each node to look for messages.

So in this case, Kafka solves 3 problems for you:

  • It can route really well and easy (if you know what you're doing)
  • It will handle nodes crashing / coming online well
  • You can poll Kafka directly, which is intended, instead of polling a database, which may or may not work well.

Additionally, if you're really scaling:

  • Kafka doesn't wait for I/O. Conventional databases are usually limited by iops (I/O per second) because of transaction boundaries. This can be really slow. Like 100s of messages (per disk) vs. millions in Kafka.

That's just a couple of things, there could be more...

Not the answer you're looking for? Browse other questions tagged or ask your own question.