Why Delta Sync Doesn’t Matter
The Hype
One of the common requests we often hear is for us to implement delta sync. Since beginning our implementation, we’ve found that it really doesn’t help as much as many are expecting. Many people expect that delta sync has a large impact on syncing speeds, often times because others in the sync space have heavily advertised this feature. Many have seen demo videos like Dropbox’s where a large image is edited and only the small change made needs to be uploaded. The video says that because of delta sync, over 80% of the bandwidth was saved because of delta sync.
Claims like these are just plain misleading. As the example below shows, the savings a normal user would see is actually 0%. Looking at the types of data we see users synchronizing, the fact is most people won’t see much benefit from delta sync.
What is Delta Sync?
For those unfamiliar with delta sync, it is a technology designed to detect and send only the parts of a file that have changed. If you have a large two megabyte file and change only two bytes in it, rather than re-upload the entire two million byte file, delta sync allows you to send just the two bytes that changed. While not as useful for most small files (the bookkeeping and header information for chunk tracking starts to eat into any benefit), it appears at first glance that delta sync would seem to provide a huge benefit for large files.
A Small Example
The reason delta sync doesn’t help as much with large files is that almost all large file types are stored compressed. Videos, music, digital photos, photoshop files, PDFs, you name it — the files you deal with day-to-day are all stored compressed. Unfortunately, compression negates any benefit from delta sync. When a file is stored compressed, in the process of saving the file, the file is run through a special process that finds duplicate data and removes it. This means any change to the file, no matter how minor, changes the entire file.
So how do we get these claims of bandwidth savings for large files? With a bit of slight of hand and a some contrived circumstances. To use a concrete example, I’ll use the Dropbox demo itself. In it, a picture of a platypus is drawn over with a white X and we see the claim of 80% bandwidth savings due to delta sync. A detail that is somewhat glossed over is the fact that while a file being shown before the edit happens is a JPEG (.JPG – digital image format used by almost all digital cameras), the file actually edited is a bitmap (.BMP – an uncompressed and uncommonly used type).
I took an almost identical graphic and made a similar edit. Using the rsync tool, which is the same rsync algorithm Dropbox uses, I measured the bandwidth savings between edits made on the compressed JPG and uncompressed BMP files. The difference is striking.
For the uncompressed bitmap file, a 65K difference was generated for a 476K file, a 86.4% savings — inline with the demo.
For the compressed JPEG file, a 0K difference was generated for the 76K file, a 0% savings. There were no bytes saved versus uploading the entire file.
While the Dropbox demo doesn’t lie, it also is quite misleading. While the 86.4% savings is nice, it neglects to mention that no normal end-user uses bitmaps to store their images and the bandwidth of 65K required to send just the changes is almost as large as the entire file of 76K when it’s stored in a proper file format.
Try this with a music file, a video, photoshop files, or any other large files and you’ll find that almost none of large files commonly benefit from delta sync.
What Delta Sync is Good For
While delta sync doesn’t help much for most people’s common day to day files, it is incredibly useful for cases in which large files are stored uncompressed. The most typical case is log files for system administrators. These gargantuan text files of things like web server accesses are often hundreds of megs to many gigabytes in size. Data is just appended onto the end as additional activity is logged. This is a perfect case for delta sync.
Relative to the normal user however, use cases like these are the exception rather than the rule. Next time you see delta sync marketed about as a way to save gobs of bandwidth, definitely take those claims with a grain of salt.
How I Tested
If you’d like to try repeating the results for yourself, you can download the files I used here.
I used rsync version 2.6.9 with the following commandline to force a delta sync:
foreach f (*Edited*)
rsync –stats -e ’ssh’ $f localhost:tmp/`echo $f | sed ’s/_Edited//’`
@r_u_serious: This is based of our internal research. To be clear, I’m not saying there isn’t *any* benefit for anyone, but the benefit for most typical users is quite a bit smaller than people think.
To address your questions and examples:
At sizes below 200KB for standard 512KBps connections, delta sync really doesn’t benefit much as more of the latency is in detection of the changed file and the handshaking with the service to get the file uploaded. The benefit is even smaller for those with faster connections.
As I mentioned in the post, text files such as log files will see a benefit, but the typical user does not manipulate many large text files. What we see is that most users don’t manipulate 2+ MB text files, much less 1000 of them.
Pre-2007 Office documents can see some benefit, but again they’re typically small. Office 2007 documents see no benefit from delta sync as they’re now a compressed file format as well (just change a word file’s extension from .docx to .zip to see what I mean).
Database files and PSTs can see the most benefit, but again, typical users are not using DB files, particularly due to locking and open file backup issues.
Interesting. Thanks for your response. Your typical users don’t use Outlook pst files? Often the largest file on a typical user’s computer (other than photos, mp3 and movies)? Even your corporate and/or SOHO users?
You should probably also consider the fact that some of your past (and prospective) users may have many uncompressed files, and just moved to your competitors. Leaving the photo, mp3 and movie crew – i.e. your statistics may be skewed.
You are correct with images (.jpg, .png), but I’m curious to see test results for Outlook .pst files or for Office .ppt/doc/xls files.
One of the main things I use an online backup system for is to backup my Outlook .PST files. My PSTs can get pretty large (multiple gigs).
My personal testing with Mozy, Dropbox, and Syncplicity show that delta-sync does actually make an impact in these cases.
For example, my Mozy history shows that the first time my Archive.pst file was uploaded, it sent 1.8GB. Today, Outlook ran AutoArchive and sent the past weeks email to the archive. Syncplicity has to resync a 1.9GB file; whereas, Mozy only had to upload 11MB after the AutoArchive.
@r_u_serious, @Anish: PST files are definitely one of the big reasons we’re looking to do delta sync. Frankly all sync programs do a terrible job with them, whether or not they have delta sync. The problem lies in the fact that it’s really backup and not sync for PSTs. PSTs require special treatment beyond just delta sync as the files are open constantly, not meant to be shared across multiple machines simulteanously, and are, by default, in a hidden location. Delta Sync is just one part of the puzzle. Other features like open file backup are required before PSTs can really be supported. Backup vendors do this well as they’ve optimized for this case, but none of the sync vendors out there do.
If it REALLY didnt matter you wouldnt be ‘looking to do it’.
The fact that you are makes this post seem like very polite mud-slinging.
what if i change id3 tag on few 5Mo .MP3 files in iTunes? with dropbox, upload only 4k each?? slow Syncplcity!! with mutch c pu time give 10 minutes.
This Post is ridiculous. Many people use databases and large files. And of course your explanation about compression and so is correct, but there are a lot of other filetypes which benefits from delta sync. I personally think this post is more a justification for not having delta-sync when the competitors have it….
As a I said often before in the forums, If Syncplicity would have delta-sync I would switch from dropbox, but I have databases, lots of large files (which really benefits from delta sync!) and so this is a no-go!
@v, @michael: You’re obviously welcome to use any product you want. The point of the post is that it shouldn’t be taken as a given that a typical user will see large improvements in their day-to-day usage.
There are cases where delta sync helps, but then those should be the cases that we highlight with the proper caveats and realism rather than some artificial scenario.
You’re forgetting one of the most important aspects of delta sync, and the one thing that makes Dropbox so efficient: it compares the blocks of packets that it sees it needs to upload to everything on its servers.
Here’s an example, because this is a little complicated. I had a 750-MB video file that I wanted to share with my mother who was across the country. It was an HD-video file (H.264, obviously), so yes, it was largely compressed on its own. The file was a common documentary file. After downloading it, I renamed it to fit my computer’s organization scheme. When I told Dropbox to upload it, I noticed that the network usage monitor on my Mac’s Dashboard only reported about 70 MB uploaded. I checked on Dropbox’s website, and yes, my complete file was there, although it had only uploaded 1/10 of its data.
It turns out that what makes Dropbox so efficient is that it looks for similarities in its blocks of packets in other users’ uploaded things. So this means that someone else using Dropbox has uploaded this particular HD video, even if it had a different filename and file metadata, but because the underlying video was the same, it only had to upload around 70 MB out of a 750-MB file.
I’m not trying to say that you’re wrong, but saying that Dropbox isn’t anywhere near as efficient as they claim to be is misleading. Sure, in those situations you’re describing, you might be right. However, they have taken many, many steps to streamline the process. However, I am still a dedicated Syncplicity user and will leave Dropbox forever if/when the Mac client is re-released.
To add one more example: TrueCrypt volumes.
Dropbox handles them gracefully, uploading the file as soon as the disk is dismounted and I believe it does delta sync there, since my volume is rather large and yet after small changes transfer takes only a moment.
Renaming music in iTunes, for my situation:
Syncplicity: upload ~3MB for an average music file + 2MB for the new “iTunes Music Library.xml”.
Dropbox: upload < 100KB.
you probably could have implemented delta sync in the time it took you to write this idiotic article. or at least i could…
I have 2 good suggestions :
why you guys implement the deltasync for files over an X number of megabytes
lets say 10 or 100, and keep your regular sync for everything else, using a flag over the file to track the updates?
we have a 200mb quickbook files that change only opening the file (we use that file every couple of minutes ) and most of the time there are only small changes or no changes just search and print, take a couple of hours to backup the regular file the way it is, but with delta sync it will be only minutes since we just make small changes a day, sadly so far we will lost half of a full day of work if the quickbook file is corrupted, with deltasync we will lost minutes only
I suggest also prioritize some folders (exam backup first c:\billing and then second c:\games)
another good idea is schedule the backup
It’s arrogant to title this blog post “Why Delta Sync Doesn’t Matter”.
As you admit in the post itself, it ‘doesn’t matter’ only if you work with small files or if your files are in a compressed format that delta sync can’t work its magic on.
Syncplicity seems to have given up on real-world computer users who produce large uncompressed files, like InDesign files (that’s me), Quicken files, PST files and the host of other types of large file that have been mentioned by other people.
The only way I can use Syncplicity is to leave it on pause all day long and turn it on when it’s time to go home. I try to be a ‘green’ computer user and prefer to turn my computer off at night.
Leonard Chung says “we’re looking to do delta sync.” We’ve been hearing that for almost a year.
Sorry – earlier Forum responses generally said “we’re working on it.” Now you’re only “looking to do”, which sounds more nebulous. This, combined with the arrogance implied in the blog post title, fills me with very low levels of confidence.
When can we expect delta sync, Leonard? Should I hang in there or move to Drop Box now?
This post concerns me.
There’s a world of difference between “we can’t figure out how to make this work” and “this doesn’t matter”.
Companies who start dictating to customers what they do and do not need don’t have the brightest of futures, I fear.
Let me be blunt about this. I *like* syncplicity and have been an advocate of the product – but this blog post has made me consider looking to dropbox or livemesh instead.
Propaganda. We want it, you can’t get it done, so you explain why it isn’t needed. Nice marketing effort, anyway.
Just hope Dropbox doesn’t get the multiple folders going, or a lot of people on the fence will be gone.
Oh yeah, and the common forum item mentioned is outlook .pst files. Doesn’t matter?
I totally agree with Mark Truelove comments about “propaganda”. The customers want it! How do you know its useless since a lot of us use *huge* files and request the delta sync?
A lot of people are using PST files, dbase files and huge non-compress files of any kind.
You guys at Syncplicty have a great product but you should stop putting energy on arguing endlessly and start focussing on developping and filling users needs!
Thanks
Sorry, the comment form is closed at this time.

Goodness. Talk about sticking your head in the sand. What about text files, word documents, excel spreadsheets, sql database files, encrypted partition designed with deltas in mind, etc.? You know, the types of files where users make incremental changes?
The heart of your point is interesting – delta sync may be less important than people think. But to say that it doesn’t matter at all is just not true. Given that syncplicity does not have it, and your competitors do, I suggest that you justify your claim. So, have you done a statistical analysis of your user base? Or is the above “almost all large file types are stored compressed” just what your gut is telling you? And what about smaller files with incremental changes?
1000 x 2 meg text files updated to 1000 x 3 meg text files is a lot of repeated transferred data.