1

I'm making an iOS App, well, two actually. App A will be able to write TO-DOs to a remote MySQL database, and App B will be able to read what App A has written/modified. App B needs to be always up-to-date and there can be more than one instance of App B running on different iDevices. App B stored the almost-exact copy of what App A has written on the remote MySQL DB, in a local SQLite DB. MySQL operations are handled by a simple PHP script. The problem is how to update App B' local DB when something changes on the remote MySQL database (that is, when App A writes something new or modifies something existing). I came up with two possible solutions: The first is to tell the PHP the last_row_id (the id of the last row that App B has got from the server) so that the PHP can ask MySQL to get all the rows with a greater ID (ID is an auto increment value on MySQL table).

But... This won't help to discover the changed items.

The second solution I thought was to put a "last_updated" column on the MySQL with a timestamp and update it every time the row changes. App B should store the timestamp of the last sync, and report back this value to PHP so that the it can request the MySQL all the rows that have a timestamp greater than the one it receives from App B.

But... Both solutions fail when the MySQL is sharded and clustered (and this is what I'm interested in the most). The auto-increment id approach fails because it will no longer be progressive (see auto_increment_offset and auto_increment_increment), and the timestamp approach would be unreliable because the clock can be out of sync on any MySQL server.

How would you solve this problem in a reliable way?

4
  • How far out of sync can the clocks be? You shouldn't have too much spread if NTP is running on all devices.
    – cxw
    Commented Aug 3, 2015 at 9:52
  • NTP is running on all devices. But with timestamps I fell that something can go wrong, and if it happens there is no way to recover. Client(s) will be forever out of sync. I'm wondering how apps like Telegeam can do this kind of synchronisation...
    – Alex
    Commented Aug 3, 2015 at 11:22
  • You may want to start with how MySQL manages records in your cluster. Whatever it is using, should be something your app can rely (timestamps, etc.).
    – JeffO
    Commented Aug 3, 2015 at 13:42
  • The auto-increment problem can be avoided completely if you use random value (BIGINT is safe enough) instead of progressive number Commented Sep 11, 2017 at 8:59

3 Answers 3

1

Instead of relying on a Updated Date/Time field, use a Version Number field. When created, it's 1 and then incremented with each update. Not hard to do at all at the database level.

The monitoring app just tracks Primary Key & Version Number for each record and does a periodic comparison.

This could be a check to handle a data change by one user happens while the other user doesn't have fresh data. Basically all updates are not allowed if their version number doesn't match the current number. This is checked before the new record increments the value.

0

Invertible Bloom filters, e.g., per this paper.

The GNU Name System video from 30c3 has an intro. Server adds primary keys of records it has to a filter. Client sends server primary keys of records client has. Server removes those from filter. What remains are records to send to client.

To cut down bandwidth used, you can store coarser timestamps. E.g., only send records starting from the week before the last sync. Then it will work even if your clocks are off by several days.

0

You can create a transaction log table with guids which stores the changes made and by whom and when (and on what table/entity if you choose to break it out).

ex: logid|trantype|oldvalue|newvalue|datetime|user {guidxx1234}|update |{x=1, y=5}|{x=1, y=7}|yyyymmdd hh:mm|username or id {guidxx1235}|insert |{} |{x=45, y=7}|yyyymmdd hh:mm|username or id

The sync process would have to repopulate the changes made in the local database to the server database by comparing missing transactions in this log.

This may not necessarily be the best way for you, but easy enough to implement regardless of db vendor.

Not the answer you're looking for? Browse other questions tagged or ask your own question.