Why split up data retrieved from a database into multiple endpoints, if we need ALL the data anyway?

Question

I have a "Games" API which retrieves video game data from a large database. The /games endpoint returns some very basic information about the game, such as the title, description, etc.

More detailed information, such as the developer, is actually not given by this endpoint. Instead, a reference ID, to be used for another endpoint, is returned.

Example:

{
    "id": 01,
    "name": "Pokemon Red"
    "developer(s)": [
                    100131
    ]
}

Now, if we want to know the developer(s) of this game, we need to use another endpoint, such as "/developers" with the relevant ID, to actually know who the developer of this game is, and there are many other pieces of information split off like this in this API.

What is the point of splitting off the data into many different endpoints if we eventually need to display all the relevant information to the end customer anyway? Wouldn't it be simpler to just have the /games endpoint return all the necessary information upfront?

GraphQL is designed to solve this very problem. Wouldn't it be simpler to just have the /games endpoint return all the necessary info upfront? Not really. What if you don't need all the info? What if you want to see other games by the same developers? Should the endpoint provide these aswell? At some point you'll find yourself just returning your whole database. — tkausl, Commented Mar 31 at 17:21
Could you please elaborate on how GraphQL solves this problem? — Yeager, Commented Mar 31 at 18:13
@Yeager In GraphQL, the client submits a query for all the data it wants (and only the data it wants), and then the server goes and rounds it up and sends it all at once. So it prevents several round trips across the internet between client and server. It's a pretty nifty tech, and I recommend trying it out. GitHub has a free explorer you can play with. — PunctualEmoticon, Commented Apr 1 at 22:54
Graphql is one solution to the problem. You can use traditional api methods to do the same thing, people seem to just refuse to in the name of their personalized interpretation of "rest" — aaaaaa, Commented Apr 1 at 22:56
Graphql comes with its own problems. It is complicated. And often hard to implement efficiently. And partially shifts responsibility for performance to the caller, what if the caller requests too much data? Should we blindly trust callers? Of course not. Too handle this you need to complicate the endpoint even more. I'm not a fan of graphql. — freakish, Commented Apr 2 at 6:21

Peter Mortensen · Accepted Answer · 2024-04-02 21:06:05Z

Your assumption that you always need all the data is typically false. At least at the given time. You may have such a need with your game example, but it seems that even these API developers think differently, judging from how it was designed.

Typically, at time X, you will only need some data, say DX, and then at time Y some data say DY. Now you may load everything up front, but maybe you will never reach time Y?

So there's a trade-off. How big is DY actually? Is it worth loading it, even if I won't ever use it? Because loading data takes time. And memory. And sometimes even CPU (parsing, and some transformations). And there's a security risk involved as well: should we really serve all the data to user, including sensitive data?

Think about a bank serving your account information. Why would it serve your account history up front if you may actually never want to see it? And that might be lots of data. Lots of sensitive data by the way.

Another example: consider Gmail, or any email service. Do you really think it should serve all emails, with all contents, including attachments up front? For those emails from ten years ago as well? With my account, I would probably wait ten years for it to finish loading. In fact, those email services will try to stop you from doing such a thing. Because it would kill their side as well.

And your example fits this pattern quite well, I think. Are you sure that information about a game developer is always relevant to everyone? It sounds unlikely.

Another reason is that if the data dynamically changes you may want to actually call the server and retrieve the most recent piece of information. So back to the bank account. I log in and the server sends me the entire history for some reason. Ok. But in the meantime someone sends me money. Of course I need to call the server again to retrieve that information. Or maybe the server pushes it to me. Either way, some additional communication is needed.

But you are also somewhat correct. Some people will combine lots of various data into single calls. That is especially useful on startup. But requires careful consideration and design. Because you may want to load as much as possible, but not more. How can you find this balance? It is a hard question, for another time maybe.

But eventually you will need to load some data again anyway. And you rarely want to load everything each time.

This particular case doesn't really illustrate this well--other than a bit of load serving up the parent with the child has minimal downside. Your examples involve serving up the children with the parent--something that should rarely be done and only after due consideration for the amount of data that will get thrown around. — Loren Pechtel, Commented Apr 8 at 4:24

JimmyJames · Accepted Answer · 2024-04-01 21:36:58Z

I'm working under the assumption that you are looking at an API that you did not develop and do not have access to the source or design details. If that's incorrect, please let me know.

First off, it's usually difficult or impossible to look at an external-facing API and know what the design is behind it and/or the decisions behind it. It's quite possible that this is over-designed. This answer is really more about the general reasons you might do this. Don't read it as a specific endorsement or critique of this specific API's design.

You might assume that there is a single database that stores all this information but that's not necessarily the case. And that's one of the goals behind an API like this: to decouple the user interface from the implementation details. Even if all this data is stored in a single DB, by breaking down the API in this way, the implementation can be changed easily without disrupting clients. That's one good reason to do this.

Another reason for this kind of thing is performance. One thing I've encountered many times in my career is an assumption that the way to make things fast and efficient is to do as much joining, filtering, and preparation of data as possible on the database and server. There is a lot to that idea, but taken to its logical extreme, it can cause some really terrible performance from the client perspective. This happens because when the client makes a request to a synchronous service, they generally will have to wait until the last byte of the response is received and parsed. I say 'generally' here because there are some ways to get around that and start doing things sooner, but in my experience, these can be tricky to put in place and come with some pretty major downsides such as managing errors that occur mid-stream.

Typically, 1) the client sends a request such as a GET. 2) The server then does its DB queries or other tasks to get that data. 3) Then it creates a response document. 4) Then it transmits the document. 5) The client receives the entire response 6) once the response is fully received the response is parsed 7) then the data is displayed and/or otherwise used. The time from request to display is then the total of all 7 of these steps.

So how does decomposing the API as in your example help with this? First off, the more data you are pulling and joining together, the longer steps 2 & 3 will take, all things being equal. The more content in the response document means steps 4 & 6 will take longer for obvious reasons. Steps 6 & 7 take a little longer as well but typically aren't a major concern. While all of this is happening, the user sees no outcome. Maybe a spinning wheel.

If the API is decomposed, it doesn't make the total time faster (it might even be slower) but the intervals during which there's no visible progress are shorter. And what if users don't usually care about the developer information? Why should they have to wait for that to be retrieved? Why take on the cost of collecting things that many people don't look at?

I learned this lesson the hard way early in my career. We were tasked with providing a broad and deep tree structure. Because "it was more efficient" the more senior developers on the project decided it would be pulled down as one large document. For broadband users, this meant it took about 30 seconds from the time they requested the document until the time it was displayed in the UI. We had some users that were on an island with 56K modems, though. They could wait upwards of 15 minutes for many of these documents.

One of silly things about this was that many of the items in this content tree were generally applicable to many or all of the requests. The primary hack I used to improve the performance was to cache those details locally instead of pulling them down on each request. The more general and effective fix was to decompose the API so that each node of the tree was retrieved individually and displayed as it was received. This not only avoided the user having to wait for 100s of items they weren't even going to look at or use but also allowed for horizontal scaling (which could be a reason in the case you are looking at, especially if a CDN or similar is involved.)

Our users were very unhappy with the first design. They loved the second approach. It looked magically instantaneous to them because the time it took them to scroll through the top layers and find what they were looking for was usually more than enough to fill in the details of those items behind the scenes.

VoiceOfUnreason · Accepted Answer · 2024-04-01 14:35:00Z

8

What is the point of splitting off the data into many different endpoints if we eventually need to display all the relevant info to the end customer anyways?

The REST answer is: caching.

Separating information into distinct resources allows you a finer grained control over cached lifetimes than you would get by returning the entire representation of a single resource.

Think web pages - we separate out images and scripts from the main body of the HTML, which allows us to more readily re-used cached copies of the resources that are used in multiple pages.

answered Apr 1 at 14:35

VoiceOfUnreason

33.2k2 gold badges43 silver badges81 bronze badges

What is the benefit of fine-grained client-side caching if it leads to more network traffic overall?
– JacquesB
Commented Apr 6 at 13:00

Add a comment |

Ewan · Accepted Answer · 2024-03-31 18:18:40Z

The idea is to have multiple generic endpoints which can be combined to produce a variety of different results.

So for example, say you include the developer list in the Game object as you suggest, but have a page which lists games with just the title. You would be retrieving lots of duplicate information that you wouldn't use.

Similarly, if you wanted to list all the developers, you would have to get all the games and then work out a distinct list of developers across them. Very inefficient.

By splitting the endpoints and only including the IDs, you allow different apps to retrieve the information they need and assemble it as required. This decouples the presentation layer from your API layer.

There is also a more academic argument to be made about your choice of objects, aggregate roots, domains etc

Say we Have the Game object:

Game
{
  id
  title
  developerIds[]
  releaseDate
  price
}

Do we want to add a second version of this object which fit our page better? We might have

GameForLists
{
   id
   title
}


GameWithReviews
{
  id
  title
  reviews[]
}

GameWithDevelopers
{
  id
  title
  developers[]
}

Soon we have dozens of such objects, all representing a Game, endpoints which fetch data about games get duplicates which return different game ViewModels We find it hard to pass data from one control to another, say I have my list of GameForLists but now i want to click on it and show the release date, I have to make another call and get a Game. If I have a method CalculateGameOffer does it take a Game as a parameter or a GameWithReviews?

It's better to decide on what your objects are at the start and stick with them. That way you build up a consistent model of your underlying data which is flexible and can be reused. You can build logic and functionality around this model rather than going back to the beginning each time.

This makes for a faster turn around for features at the presentation layer. You don't have to go back to the database, add a new view, new objects, new api endpoints each time you have a new idea you want to try out.

You are saying that if I were to have a endpoint that returned EVERYTHING, I would end up with the 2nd example you provided, which is several different objects with different data in each for different uses? — Yeager, Commented Mar 31 at 18:36
no... if you return everything then you have the first problem i describe. ie your drop downs take forever to load because you are getting EVERYTHING the second issue is what you get when you try and solve that by adding views. ie "everything" (for this use case) — Ewan, Commented Mar 31 at 21:44
@Ewan, yeah, I've been fixing a lot of that sort of bug myself lately... performance problems caused by the fact that loading a list of code/descriptions for a dropdown box is loading and throwing away a vast amount of other data because the existing APIs are optimised for a different use case. Sometimes that's the right thing to do... and sometimes not. — Simon Geard, Commented Apr 1 at 23:16

JacquesB · Accepted Answer · 2024-04-04 06:38:47Z

If you know clients usually need all the detailed data then it is perfectly fine to include it all in a single endpoint.

But there may be requirements which cannot be solved with a single endpoint. If the client also need to be able to look up a particular developer to see which games they have been involved in, then you need a separate developer endpoint.

Even if you have a separate developer endpoint, you could still include related information like developers in the games endpoint. But you have to be careful with circular references then: if the games endpoint contains developer information, this developer information cannot in turn contain games information for each developer, since this would lead to an infinite structure.

Stack Exchange Network

Why split up data retrieved from a database into multiple endpoints, if we need ALL the data anyway?

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
database
rest
api-design
or ask your own question.

Hot Network Questions

Why split up data retrieved from a database into multiple endpoints, if we need ALL the data anyway?

5 Answers 5

Not the answer you're looking for? Browse other questions tagged databaserestapi-design or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
database
rest
api-design
or ask your own question.