0

We got a distributed application where a client syncs certain data portions to its offline storage and performs modifications on it. The data is structured in container entities "Container" and entities "Entity". Each "C" contains many "Entity"s and each "Entity" has to belong to a "Container".

When the client updates, it iterates through its local Cs and uploads the changes to each E in that Container. Afterwards it asks for all Entities of the current Container that changed since the last synchronisation and modifies/adds them.

The system has a very granular access rights system, so sometimes it can come to the following situation: A specific Entity is not available anymore to the client. Now how should the server propagate that change to the client? In the current situation, when the client requests for changed Entities of a Container since the last sync time, it won't receive it -> since no access rights. If the Entity has some local changes the server can inform the client when it tries updating it. But what if it has no local changes to be synced?

I already explored the following solution paths, but always found disadvantages:

  1. Go through each local "Entity" and check if its still available -> too many requests.
  2. List all "Entities" upon synchronisation and check which local Entities don't appear anymore -> increases data load on sync requests.
  3. Make a separate call at the end, to check which Entities are not visible to the client anymore, since last sync -> Query too complex.

The only somewhat feasible solution I could come up with:

In the request to get all Entities that changed since last sync, also include the ones that are not visible, but give them a "deleted" state, like:

{
  "id": "...",
  "state": "Deleted"
}

Then the client can react and delete them.

This solution is doable but introduces a lot of extra logic into the request.

Does anyone know a standard approach/design pattern or method to this? Any input appreciated.

3 Answers 3

0

Reinventing the versioning wheel

Does anyone know a standard approach/design pattern or method to this?

This sounds a lot like a versioning system. Are you sure you're not trying to reinvent the wheel?

Even if this isn't a case of reinventing the versioning wheel, it's still interesting to take some inspiration from versioning systems (Git, SVN, ...) on how the interaction between client and server works (you don't necessarily need to look at the specific implementation, the contract between them is relevant enough as-is).

It's no coincidence that I've used "push", "pull" and "check out" in this answer - all git commands. The similarities are quite accurate.


Top-level API design

You're trying to perform two jobs at once. It's beneficial to think of these as separate operations and analyze them separately.

  • Pushing the client-side changes to the server
  • Pulling server-side changes to the client

This means that both parties need to send information to each other.

  • If only the client gives information to the server, you miss out on server-side changes.
  • If only the server provides server-side changes, your client can't push its updates.

Now, you could stack these into a single web request (client pushes changes, receives server-side changes), but I would strongly advise against it as it's coupled too tightly.

So in conclusion, the better approach is to have two requests; one to push changes to the server, and one to sync changes from the server.


Fetching server-side changes

Currently, the focus is only on entities whose access has been revoked for the current user, so I'll only focus on that content.

A simple first approach to this is to have the client sent its lastSyncDate (i.e. when it last fetched server-side changes) to ensure your server doesn't send things the client already knows.
The server then looks for all revoked access since that date (which implies the server tracks timestamps of changes to access privileges), and returns the entities that were affected by these changes.

If this can lead to a massive list of entities and you really want to cut this list down to only contain entities that you know the client has copied locally, then there are two ways to cut this list down:

  • The client specifically sends you a list of entity IDs that it has stored locally
  • The server already kept its own record of which entities were checked out by which client and thus already knows this information

I suggest doing the former, as relying on the server's information can be an issue in case the client's local storage has changed without the server having been made aware of it. E.g. if the client's hard drive crashes and the application is reinstalled from scratch, then the server thinks that the client has local copies when in fact it does not.

Either way, you could significantly cut down on both query runtime and response data size by trimming the list to only contain entities that the client cares about.

I don't know your underlying data store so which solution is the most applicable is not something I can decide.

The client now knows to simply delete these entities from its local storage.

Note that if access can be revoked temporarily - then you wouldn't want to automatically delete the local changes without the user's consent, as they may want to keep these changes so they can push them when access has been restored.


Pushing client-side changes

The only real issue here is how you deal with changes to entities on which access has been revoked since the last push. I can't answer that for you.

  • You could allow the changes if the checkout happened before the access revoking.
  • You could allow only the changes that were made before the access revoking, provided your local client stored (trustworthy) timestamps of all local changes.
  • You could outright refuse to accept any changes after the access has been revoked.

In all cases, this is a decision that needs to be made by the server. The client pushes its content (or tries to), and the server decides what to do with it. If the server rejects it, you return the error to the client.

While it is tempting to do so, you may want to hold off on having the client immediately remove the local entity when it encounters an error. Because you may in the future have other reasons for a push to fail (e.g. temporarily locked entity, database unavailable, ...) at which point you don't want your client to immediately remove its local changes.

I suggest for the client to rely on the fetch method to actually decide what to delete, because at that point the client has explicit confirmation from the server that access on these entities has in fact been revoked.

1
  • Thanks for your really detailed answer. Yes, we don't try to re-invite the wheel and are heavily inspired by existing version control architectures. The thing is: We have a really complex permission schema. It's not only that clients can be granted and revoked access to entities but it also depends on their state. We now consider the approach with sending the local IDs to the server for verification. THANKS TO YOU! :-) The lastSyncDate we already use, so maybe we do a hybrid, first sync what the user sees and then verify the remaining local IDs.
    – ampramp
    Commented Dec 5, 2019 at 11:54
0

Your "somewhat feasible" solution is actually the most sensible one. The client need to know which entities changed relative to the last known state, and changes of entity contents/value should be signalled along with changes of availability (whether an entity is available/unavailable due to actual insertion or deletion or due to changed rights is probably not the client's business.)

0

General Case

A standard way of handling permission checks is to treat them like deletions/non-existent records, for example returning a 404 (not found) rather than a 403 (forbidden).

Some would say that it is advantageous from a security standpoint - ie: not reveling the existence of a record at all, however it can also be a lot easier from a API design standpoint as you can assume that any "search" query will always return data that the user can access - hence you don't have to return a bunch of records then drop some of them on the client side.

As a rule of thumb you will probably find that if you filter the records in your data access layer ** you probably won't have a huge amount to do in your business logic, because in most cases business logic only processes active (non-deleted) records.

** - A typical filter would be something like a SQL WHERE clause - which would be also added to update and deletes to ensure that records that the user doesn't have access to don't get modified.

Your Use Case

Two common scenarios when the above strategy fails are:

  1. When the user actually needs to know that they don't have permission
  2. When the deletion date is important.

I suspect that you are running into issues here, because you are not actually deleting records from your data store, instead you are simply marking them as deleted. As a result you typically use the deletion date in your data store to signal the client that the record has been deleted. Hence if the record is missing/filtered you don't have a way of knowing it is only present on the client.

The best solution probably depends on what your want the user experience to be:

  1. Leave the record on client, but get no updates (easy to implement).
  2. Silently drop the records as if it had been deleted.
  3. Notify the user that they no longer have permission and then delete.

Solution

Implementing the last two options probably requires you to look at your current logic that handles deletions - you probably already have a query that finds all "real" deletions that occurred after a given date.

What you need is an additional query that finds all records that:

  • Are not actually deleted.
  • Are filtered out by the permission check.
  • Where the permissions have changed since the given date.

You may be able to tighten up that last check, for example only considering cases where permissions were revoked from the given user, however assuming that permissions don't change that often, sending a few extra "false" deletes to the client shouldn't be too much of a problem.

Not the answer you're looking for? Browse other questions tagged or ask your own question.