10

Let us say I have some data in a Json format. Let us say the Json format is the following:

{title,
resources[] }

The resources array contains information about a graph. But it should be iterated upon and converted to a different format.

The problem

Given the resources array size is 100, it should be fine and reasonable for the conversion to happen on the front end. What about if the resources arrays size is much bigger ? Could there be a threshold where iteration and converting the data from one format to another should happen in the back end ?

I do have substantial reasoning for passing the conversion responsibility to the front end, but I wonder whether this will scale ?

Supplementary question

What are the main axes my decision where this conversion should reside rely upon? I found a few answers here and here which were good answers but I could use a few more solid arguments.

6
  • What is your "front-end" and "back-end"? How are they connected? How much data arrive in your system over time? Are they sensitive w.r.t. disclosure? Or integrity? What is your pre-existing reasoning? What operations does the conversion require and how many? What are the nun-functional requirements on your application? Those are just some of the questions that we'd need answered in order to help you. Commented Nov 5, 2019 at 14:52
  • A premise to keep in mind is that servers must serve. One thread working on transformations is a thread not attending other requests, hence not serving. Performing hundreds or thousands of transformations on the server-side are more expensive than on hundred or thousands on the client-side. Not only in terms of resources, in terms of opportunity costs too.
    – Laiv
    Commented Nov 5, 2019 at 15:24
  • @Laiv: I agree, but there is a footnote here where proprietary calculations or security-related logic should not be exposed to clients, as it gives them access to things you want the server to specifically not leak. It's contextual whether offloading to the client is appropriate or not. But your comment is spot on in cases where there is no issue with leaking the logic to the consumer.
    – Flater
    Commented Nov 5, 2019 at 16:35
  • While I somewhat agree with the close vote that this answer is too broad for its own good, I don't think it warrants closing. I think the correct answer here (narrowing down the development requirements to find the appropriate answer) is a valid answer to the question, and the question is bound to come up again for other users so an explicit answer (on how to approach this) is better than a closure with no answer.
    – Flater
    Commented Nov 5, 2019 at 16:40
  • Can someone please provide me with a resource or a link on the security aspect of this ? A term to seach for maybe or a scenario's name, where "propertiary calculation or security related logic" is exposed to clients? Commented Nov 5, 2019 at 17:52

4 Answers 4

8

The decision to do the transformation on the front-end or the backend should depend much more on the kind of transformation that you want to do rather than the number of items involved in the transformation.

As mentioned in the answer by @Ewan, unless you know that your user will use low-power devices to access your application, your users will have enough processing power to do almost any transformation you throw at them.

On the other hand, if the transformation involves deleting sensitive data, then that should really be done on the backend.

Also, if the transformation significantly reduces the amount of data, then it would be better to do it on the backend and reduce the network bandwidth you need.

3
  • 5
    Good points. I would also suggest the case of caching. It might be tempting to push a very expensive calculation to the client, but if the same calculation would be performed many times by different users, it may make more sense to do it once on the server then serve the cached result subsequently, especially if each request would have an element of overhead for the server in gathering the basic ingredients, which could be reduced if a prepared and stored result was served.
    – Steve
    Commented Nov 5, 2019 at 16:02
  • I apologize asking this year's later, but I found this simple and easy to understand explanation and wonder if it applies to my case, too. I fetch data (calendar records) and my weekly view needs the data in a different way than my month view. Do you think I should transform one data into the other (quite possible, but requires a series of array transformations) or if I should fetch it from the server? I'd go for the latter, but if my application is supposed to work offline, the former would be better, right?
    – Igor
    Commented Dec 15, 2022 at 2:11
  • 1
    @Igor, if you need to support off-line operation, there is only 1 way to get it to work. Otherwise, my answer would apply to you as well. Commented Dec 15, 2022 at 10:41
8

Where to transform data?

Should data transformation be on the front or on the back end?

The question is slightly nonsensical, or at least glossing over certain necessary consideration. Conversion happens wherever conversion is the most appropriate. That's the short answer, but there is of course a lot of information hidden behind how to consider something "appropriate".

The frontend is bound by the API that the backend exposes. Therefore, if the frontend holds its data in a different structure than the backend expects, it's inevitable that the frontend must transform its data.

As the developer of both the frontend and the backend, you're liable to muddy the lines between front- and backend and start building the backend to suit the frontend instead of the (usual) other way around.

I don't feel comfortable making a blanket statement here, but IMHO it's a generally better approach to work from back to front, that is to say that you let the backend define its API based on its own considerations, and then you create the frontend.

Whether the frontend driectly depends on the backend API's data structure or you implement an additional frontend DTO is a very contextual consideration. For example, an SPA would generally have its own abstraction layer, whereas a simplistic reporting tool (based on predefined backend queries) generally won't.

Do note that if the frontend was already developed and the frontend data structure is actually usable in the backend, there's nothing wrong with the backend implementing the same structure.
However, if you find yourself constantly rewriting the backend API because the frontend changed its data structure, then you're doing it wrong.

Should you transform the data?

But it should be iterated upon and converted to a different format.

What are the main axes my decision where this conversion should reside rely upon?

Before you get your answer, investigate your problem. Why does it need to be converted? Depending on the reason, the location for the conversion will be clearer. Some examples to prove the point:

  • "Because the frontend data structure is just different" -> Why is it different? Could it not just use the same structure as the backend? Because if it can, then you don't need a transformation at all.
  • "Because the server stores the data in a different format" -> Your frontend does not need to know about the storage method your backend has decided to use, and therefore your frontend should not be responsible for transforming the data appropriately
  • "Because the data structure used by the frontend's JS plugin is different from the backend API structure" -> Your backend does not need to know about the plugin your frontend has decided to use, and therefore your frontend should be responsible for transforming the data appropriately

I can't definitively answer this for you because I don't know the real world considerations that caused you to use these different data structures that you are trying to transform between.

DTO transformations

Theoretically, every layer of your application should have its own DTOs which it maps to. This means that if the backend has multiple layers (web, BLL, DAL), then you should have DAL DTOs, BLL DTOs, and web DTOs.

In reality, this kind of abstraction, while theoretically elegant, often leads to copy/pasted DTOs with little technical benefit and a lot of copy/pasting when a DAL entity changes its structure (e.g. new properties).

There is a careful balance to strike here. You strike the appropriate balance by observing your situation. If your backend is a pure REST API which exposes your actual data entities, i.e. your backend is purely a database accessing service (where the backend API will inherently always expose the exact data entities without business logic inbetween), then a lot of complexity can be cut out. But your backend performs its own business logic, then the necessity of separating your layers using DTOs becomes more important.

The main takeaway here is that I think you need to study your layers and their responsibilities more deeply, as your question requires many more considerations than what you have explained in your post.

Your scalability concerns

Given the resources array size is 100, it should be fine and reasonable for the conversion to happen on the front end. What about if the resources arrays size is much bigger ? Could there be a threshold where iteration and converting the data from one format to another should happen in the back end ?

I do have substantial reasoning for passing the conversion responsibility to the front end, but I wonder whether this will scale ?

If you think that it doesn't belong in the frontend for 1000 elements, then it also doesn't belong in the frontend for 2 elements. It's as simple as that. Theoretically, the appropriate location (of any code, really) is decided by its functional purpose, not by the sheer amount of data passing through it.

To that end, "what if there's more items in the array?" is not a valid question. If you change your mind based on the amount of items in the array, then you were doing it wrongly even before the array became larger.

Optimizing for performance

That being said, real-world performance bottlenecks sometimes require us to be less-than-theoretically-perfect about how to implement something. But in such a case, you are currently putting the cart before the horse with your question.

Performance should not be prematurely optimized. You would only need to tackle this problem once it has become a problem. The question you're currently asking is about what could possibly happen in the future in certain circumstances that may or may not happen. It's inefficient to preemptively protect against any and all possible future issues. It's much more efficient to deal with problems as they arise.

In the end, as far as the runtime is concerned, you are much more interested in the backend performance than the frontend performance (as you are paying for the backend server, but not for the user's computer). This is part of the reason why I suggest developing the backend API (at least the interface) before the frontend.

If this leads to frontend conversions, then that's okay.

If these frontend conversions become too cumbersome for the frontend, they are notably detracting from the user experience, and you are willing to have the backend server take on the computational effort of all (!) of your users' transformations, then you can re-evaluate/extend your backend API to accommodate the frontend's data structure and take on the conversion.

If you properly separate your layers and use good practice from the get go, these future changes will be minimal and you can ensure that you only take the steps you need to take, not the ones you think you might need to take in the future. This prevents a lot of wasted effort on preventing things that never would have become an issue anyway.

3
  • I am open to feedback in relation to the downvotes.
    – Flater
    Commented Nov 5, 2019 at 16:26
  • 2
    Fantastic answer! Commented Mar 9, 2020 at 10:34
  • Incredible answer. Commented Apr 13 at 15:12
4

Actually it's the reverse. You want to push as much processing to the client as possible.

Assuming they are on a modern desktop PC, they may well have more resources than your server, and as a group, the clients have vast amounts of processing power equivalent to a super computer.

Plus things running in your datacenter cost money, the clients PC is free!

9
  • 1
    There is much more consideration to this than a blanket "always push it to the client". Even if this is the correct answer to OP's particular scenario, I suggest rephrasing the answer in a way that it doesn't suggest that the solution to the current scenario is a blanket solution for all similar scenarios without any further considerations.
    – Flater
    Commented Nov 5, 2019 at 15:30
  • always push to the client. (for scalability)
    – Ewan
    Commented Nov 5, 2019 at 15:35
  • 3
    "always push to the client" You've already contradicted yourself by adding the "assuming they are on a modern desktop PC" modifier. "always" includes mobile APIs as well. Additionally, what about proprietary calculations? Exposing them on the frontend leads to exposing your proprietary information. Taking your "push it to the client" approach at heart and with no restrictions (always, right?), you'd end up effectively gutting the backend and making it a hollow passthrough layer in every situation, in every circumstance.
    – Flater
    Commented Nov 5, 2019 at 15:39
  • 1
    This answer really should be recast to better grapple with the factors a programmer might want to take into account in deciding where to perform processing.
    – Steve
    Commented Nov 5, 2019 at 15:58
  • I never contradict myself! the assuming clause refers to the subsequent words in that sentence. not the previous sentence
    – Ewan
    Commented Nov 5, 2019 at 16:25
3

Transforming data is full blown feature in itself. In fact it could be argued that all any software does is transform data. I presume what we're talking about is transforming what the front-end gives us to what the back-end needs.

You set a performance point saying that the front-end can handle an array of size 100. This is good to know but we're still lacking enough information to make the call. We need another performance point that says the back-end can handle an array of size x (with y simultaneous users).

With that information we could make decisions that would increase the overall capacity of the system. Without it we're just holding to our dogmas and hoping.

There are good dogmas of course that give you good goals to steer towards but since your question indicates performance is your primary concern I advise you to stop thinking about what and focus on how much.

With that answered look at what pursuit of performance has cost your design. We have these dogmas for a reason. Ewan has a good reason to put the workload on the user. When y is large it's a really good reason. But costs come in many forms. Loss of readability and flexibility cost developer time which can easily trump data-center costs. Loss of security can cost you the company.

Every piece of successful code ever written is a compromise. I've never seen anything live up to ideals. I've seen nightmares that desperately need ideals. Don't use performance as an excuse to hide from these ideals.

Not the answer you're looking for? Browse other questions tagged or ask your own question.