304

I have been working on a method to sync core data stored in an iPhone application between multiple devices, such as an iPad or a Mac. There are not many (if any at all) sync frameworks for use with Core Data on iOS. However, I have been thinking about the following concept:

  1. A change is made to the local core data store, and the change is saved. (a) If the device is online, it tries to send the changeset to the server, including the device ID of the device which sent the changeset. (b) If the changeset does not reach the server, or if the device is not online, the app will add the change set to a queue to send when it does come online.
  2. The server, sitting in the cloud, merges the specific change sets it receives with its master database.
  3. After a change set (or a queue of change sets) is merged on the cloud server, the server pushes all of those change sets to the other devices registered with the server using some sort of polling system. (I thought to use Apple's Push services, but apparently according to the comments this is not a workable system.)

Is there anything fancy that I need to be thinking about? I have looked at REST frameworks such as ObjectiveResource, Core Resource, and RestfulCoreData. Of course, these are all working with Ruby on Rails, which I am not tied to, but it's a place to start. The main requirements I have for my solution are:

  1. Any changes should be sent in the background without pausing the main thread.
  2. It should use as little bandwidth as possible.

I have thought about a number of the challenges:

  1. Making sure that the object IDs for the different data stores on different devices are attached on the server. That is to say, I will have a table of object IDs and device IDs, which are tied via a reference to the object stored in the database. I will have a record (DatabaseId [unique to this table], ObjectId [unique to the item in the whole database], Datafield1, Datafield2), the ObjectId field will reference another table, AllObjects: (ObjectId, DeviceId, DeviceObjectId). Then, when the device pushes up a change set, it will pass along the device Id and the objectId from the core data object in the local data store. Then my cloud server will check against the objectId and device Id in the AllObjects table, and find the record to change in the initial table.
  2. All changes should be timestamped, so that they can be merged.
  3. The device will have to poll the server, without using up too much battery.
  4. The local devices will also need to update anything held in memory if/when changes are received from the server.

Is there anything else I am missing here? What kinds of frameworks should I look at to make this possible?

6
  • 5
    You cannot rely on Push Notifications being received. The user can simply tap them away and when a second notification arrives, the OS throws the first one away. IMO push notifications are a bad way to receive sync updates, anyway, because they interrupt the user. The app should initiate the sync whenever it is launched. Feb 17, 2011 at 23:07
  • OK. Thanks for the information - outside of constantly polling the server and checking for updates on launch, is there a way for the device to get updates? I am interested in making it work if the app is open on multiple devices simultaneously.
    – Jason
    Feb 17, 2011 at 23:10
  • 1
    (I know a bit late, but incase anybody comes across this and also wonders) to keep multiple devices in sync simultaneously you could to keep an open connection with either the other device or a server, and send messages to tell the other device(s) when an update occurs. (e.g. the way IRC / instant messaging works)
    – Dan2552
    Oct 6, 2012 at 19:28
  • 1
    @Dan2552: what you describe is known as [long polling][en.wikipedia.org/wiki/… and is a great idea, however open connections consume quite a lot of battery and bandwidth on a mobile device.
    – johndodo
    May 22, 2013 at 16:16
  • 1
    Here's a good tutorial from Ray Wenderlich on how to sync data between your app and web service: raywenderlich.com/15916/… Jan 8, 2014 at 13:51

8 Answers 8

281

I've done something similar to what you're trying to do. Let me tell you what I've learned and how I did it.

I assume you have a one-to-one relationship between your Core Data object and the model (or db schema) on the server. You simply want to keep the server contents in sync with the clients, but clients can also modify and add data. If I got that right, then keep reading.

I added four fields to assist with synchronization:

  1. sync_status - Add this field to your core data model only. It's used by the app to determine if you have a pending change on the item. I use the following codes: 0 means no changes, 1 means it's queued to be synchronized to the server, and 2 means it's a temporary object and can be purged.
  2. is_deleted - Add this to the server and core data model. Delete event shouldn't actually delete a row from the database or from your client model because it leaves you with nothing to synchronize back. By having this simple boolean flag, you can set is_deleted to 1, synchronize it, and everyone will be happy. You must also modify the code on the server and client to query non deleted items with "is_deleted=0".
  3. last_modified - Add this to the server and core data model. This field should automatically be updated with the current date and time by the server whenever anything changes on that record. It should never be modified by the client.
  4. guid - Add a globally unique id (see http://en.wikipedia.org/wiki/Globally_unique_identifier) field to the server and core data model. This field becomes the primary key and becomes important when creating new records on the client. Normally your primary key is an incrementing integer on the server, but we have to keep in mind that content could be created offline and synchronized later. The GUID allows us to create a key while being offline.

On the client, add code to set sync_status to 1 on your model object whenever something changes and needs to be synchronized to the server. New model objects must generate a GUID.

Synchronization is a single request. The request contains:

  • The MAX last_modified time stamp of your model objects. This tells the server you only want changes after this time stamp.
  • A JSON array containing all items with sync_status=1.

The server gets the request and does this:

  • It takes the contents from the JSON array and modifies or adds the records it contains. The last_modified field is automatically updated.
  • The server returns a JSON array containing all objects with a last_modified time stamp greater than the time stamp sent in the request. This will include the objects it just received, which serves as an acknowledgment that the record was successfully synchronized to the server.

The app receives the response and does this:

  • It takes the contents from the JSON array and modifies or adds the records it contains. Each record get set a sync_status of 0.

I used the word record and model interchangeably, but I think you get the idea.

22
  • 2
    The last_modified field also exist in the local database, but it's not updated by the iPhone clock. It is set by the server, and synchronized back. The MAX(last_modified) date is what the app sends to the server to tell it to send back everything modified after that date.
    – chris
    Feb 21, 2011 at 14:51
  • 3
    A global value on the client could replace MAX(last_modified), but that would be redundant since MAX(last_modified) suffices. The sync_status has another role. As I wrote earlier, MAX(last_modified) determines what needs to be sync'd from the server, while sync_status determines what needs to be sync'd to the server.
    – chris
    Aug 13, 2011 at 7:19
  • 2
    @Flex_Addicted Thanks. Yes, you would need to replicate the fields for each entity that you wish to synchronize. However, you need to take greater care when synchronizing a model with a relationship (e.g., 1-to-many).
    – chris
    Sep 6, 2012 at 18:53
  • 2
    @BenPackard - You are correct. The approach doesn't do any conflict resolution so the last client will win. I haven't had to deal with this in my apps since records are edited by a single user. I'd be curious to know how you resolve this.
    – chris
    Mar 31, 2013 at 8:29
  • 3
    Hi @noilly, consider the following case: You make changes to a local object and need to synchronize it back to the server. The sync may only happen hours or days later (say if you've been offline for a while), and in that time the app may have been shutdown and restarted a few times. In this case the methods on NSManagedObjectContext wouldn't help much.
    – chris
    Sep 26, 2014 at 6:07
147
+50

I suggest carefully reading and implementing the sync strategy discussed by Dan Grover at iPhone 2009 conference, available here as a pdf document.

This is a viable solution and is not that difficult to implement (Dan implemented this in several of its applications), overlapping the solution described by Chris. For an in-depth, theoretical discussion of syncing, see the paper from Russ Cox (MIT) and William Josephson (Princeton):

File Synchronization with Vector Time Pairs

which applies equally well to core data with some obvious modifications. This provides an overall much more robust and reliable sync strategy, but requires more effort to be implemented correctly.

EDIT:

It seems that the Grover's pdf file is no longer available (broken link, March 2015). UPDATE: the link is available through the Way Back Machine here

The Objective-C framework called ZSync and developed by Marcus Zarra has been deprecated, given that iCloud finally seems to support correct core data synchronization.

14
  • Anyone have an updated link for the ZSync video? Also, is ZSync still maintained? I see it was last updated in 2010. Nov 24, 2011 at 6:53
  • ZSync's last commit on github was on September 2010 which leads me to believe Marcus stopped supporting it. Feb 8, 2012 at 23:13
  • 1
    The algorithm described by Dan Grover is quite good. However, it will not work with a multi-threaded server code (thus: this won't scale at all) since there is no way to make sure a client won't miss an update when the time is used to check for new updates. Please, correct me if i'm wrong - i would kill to see a working implementation of this.
    – omni
    Feb 12, 2014 at 20:51
  • 1
    @Patt, I have just sent you the pdf file, as requested. Cheers, Massimo Cafaro. Mar 11, 2015 at 9:54
  • 3
    The missing Cross-Platform Data Synchronization PDF slides by Dan Grover are accessible through the Wayback Machine. Apr 1, 2015 at 5:32
11

If you are still looking for a way to go, look into the Couchbase mobile. This basically does all you want. (http://www.couchbase.com/nosql-databases/couchbase-mobile)

7
  • 3
    This only does what you want if you can express your data as documents rather than relational data. There are work arounds, but they are not always pretty or worth it. Nov 24, 2011 at 6:55
  • documents are enough for small applications Oct 17, 2014 at 10:21
  • @radiospiel Your link is broken
    – Mick
    Mar 11, 2015 at 4:18
  • This will also add a dependency that the backend need to be written in Couchbase DB. Even I started with the idea of NOSQL for synching but I cannot restrict my backend to be NOSQL as we have MS SQL running in backend.
    – geekay
    Jul 13, 2015 at 12:49
  • @Mick: it seems to work again (or someone fixed the link? Thank you)
    – radiospiel
    Jul 16, 2015 at 7:46
7

Similar like @Cris I've implemented class for synchronization between client and server and solved all known problems so far (send/receive data to/from server, merge conflicts based on timestamps, removed duplicate entries in unreliable network conditions, synchronize nested data and files etc .. )

You just tell the class which entity and which columns should it sync and where is your server.

M3Synchronization * syncEntity = [[M3Synchronization alloc] initForClass: @"Car"
                                                              andContext: context
                                                            andServerUrl: kWebsiteUrl
                                             andServerReceiverScriptName: kServerReceiverScript
                                              andServerFetcherScriptName: kServerFetcherScript
                                                    ansSyncedTableFields:@[@"licenceNumber", @"manufacturer", @"model"]
                                                    andUniqueTableFields:@[@"licenceNumber"]];


syncEntity.delegate = self; // delegate should implement onComplete and onError methods
syncEntity.additionalPostParamsDictionary = ... // add some POST params to authenticate current user

[syncEntity sync];

You can find source, working example and more instructions here: github.com/knagode/M3Synchronization.

1
  • Will it be ok if we change the device time to an abnormal value?
    – Golden
    Mar 24, 2016 at 9:53
5

Notice user to update data via push notification. Use a background thread in the app to check the local data and the data on the cloud server,while change happens on server,change the local data,vice versa.

So I think the most difficult part is to estimate data in which side is invalidate.

Hope this can help u

0
5

I have just posted the first version of my new Core Data Cloud Syncing API, known as SynCloud. SynCloud has a lot of differences with iCloud because it allows for Multi-user sync interface. It is also different from other syncing api's because it allows for multi-table, relational data.

Please find out more at http://www.syncloudapi.com

Build with iOS 6 SDK, it is very up to date as of 9/27/2012.

1
5

I think a good solution to the GUID issue is "distributed ID system". I'm not sure what the correct term is, but I think that's what MS SQL server docs used to call it (SQL uses/used this method for distributed/sync'ed databases). It's pretty simple:

The server assigns all IDs. Each time a sync is done, the first thing that is checked are "How many IDs do I have left on this client?" If the client is running low, it asks the server for a new block of IDs. The client then uses IDs in that range for new records. This works great for most needs, if you can assign a block large enough that it should "never" run out before the next sync, but not so large that the server runs out over time. If the client ever does run out, the handling can be pretty simple, just tell the user "sorry you cannot add more items until you sync"... if they are adding that many items, shouldn't they sync to avoid stale data issues anyway?

I think this is superior to using random GUIDs because random GUIDs are not 100% safe, and usually need to be much longer than a standard ID (128-bits vs 32-bits). You usually have indexes by ID and often keep ID numbers in memory, so it is important to keep them small.

Didn't really want to post as answer, but I don't know that anyone would see as a comment, and I think it's important to this topic and not included in other answers.

2

First you should rethink how many data, tables and relations you will have. In my solution I’ve implemented syncing through Dropbox files. I observe changes in main MOC and save these data to files (each row is saved as gzipped json). If there is an internet connection working, I check if there are any changes on Dropbox (Dropbox gives me delta changes), download them and merge (latest wins), and finally put changed files. Before sync I put lock file on Dropbox to prevent other clients syncing incomplete data. When downloading changes it’s safe that only partial data is downloaded (eg lost internet connection). When downloading is finished (fully or partial) it starts to load files into Core Data. When there are unresolved relations (not all files are downloaded) it stops loading files and tries to finish downloading later. Relations are stored only as GUID, so I can easly check which files to load to have full data integrity. Syncing is starting after changes to core data are made. If there are no changes, than it checks for changes on Dropbox every few minutes and on app startup. Additionaly when changes are sent to server I send a broadcast to other devices to inform them about changes, so they can sync faster. Each synced entity has GUID property (guid is used also as a filename for exchange files). I have also Sync database where I store Dropbox revision of each file (I can compare it when Dropbox delta resets it’s state). Files also contain entity name, state (deleted/not deleted), guid (same as filename), database revision (to detect data migrations or to avoid syncing with never app versions) and of course the data (if row is not deleted).

This solution is working for thousands of files and about 30 entities. Instead of Dropbox I could use key/value store as REST web service which I want to do later, but have no time for this :) For now, in my opinion, my solution is more reliable than iCloud and, which is very important, I have full control on how it’s working (mainly because it’s my own code).

Another solution is to save MOC changes as transactions - there will be much less files exchanged with server, but it’s harder to do initial load in proper order into empty core data. iCloud is working this way, and also other syncing solutions have similar approach, eg TICoreDataSync.

-- UPDATE

After a while, I migrated to Ensembles - I recommend this solution over reinventing the wheel.

Not the answer you're looking for? Browse other questions tagged or ask your own question.