google architectural history – micach lemonik – qcon

This is part of my live blogging from QCon 2015. See my QCon table of contents for other posts.

Startup 2WebTechnologies made product XL2Web. It took a complex Excel spreadsheet – many worksheets/formulas and make reusable. Converted to a middle layer format and then use Java to produce an in memory version. Made UI in Excel to change field variables and re-calculate server side.

In 2004/2005, Google approached and asked what if could drive emulator through actual web browser client. Made a prototype for IE 6 Service Pack 1 because “when a big company approaches you as a startup you usually do what they want”. It worked and Google acquired the 3 person company. Did server side calculations because couldn’t figure out how to do everything in JavaScript at the time.

Then wondered what would happen if connected a second web client at the same time to the in memory spreadsheet.Got startedwith atomic broadcasting (multiple clients receiving events from a single server). Used a hanging GET to the server so server could always respond to clients. Determined save based on a base state plus a series of mutations until got to a savepoint. Realized needed autosafe. Side effect was that users could access any revision that ever existed.

Mutations must be commutative so multiple changes by multiple users are consistetnt. Used operational tranformations which is a mathematical approach to solve for non-commutativity. Must apply messages in certain order. All peers must apply transformations from their messaging order.

Collaborative undo – If A changes from 6 to 7 and B changes from 7 to 8. A un-does. What should happen. Mix of 6, 7 and 8in audience. 7 means a single undo stack for all collaborators. Users didn’t like this at all. 8 uses log omission. Make it like A’s mutation never happened which invalidates snapshots. Crashed when user undid many edits. Then went with 6 which means revert to A’s state before change. Client has local undo log. 6 is what the user expected to happen.

Scaling – delares master server for all writes to each document. If that server goes down, fail over to a differnet primary. Multiple servers can’t handle non-communtative messages at the same time. Could shard by chapter. Can’t small unit of non-commutativity. Once your unit of consistency is too large for one server, it’s no longer a unit of consistency

Google forms is easier because each line is isolated. Easier to aggregrate.

Google hangouts has 15 person limit to guarantee consistency.


  • What db given size of data? BigTable like thing
  • How deal with latency when different users from different parts of world? Latency has to give to guarantee consistency. Locale caching/async helps
  • If client doesn’t receive echo, what happens? Will try to revert mutation if don’t get confirmationin 20 seconds. Then pop up a Google Docs crashed message. Get billions of queries a day. So a one in a billion thing happens 20 times a day. Rather user see a crash message than a chance of data loss.
  • How handle sub-sharding a document? Google doesn’t sub-shard chapters in a doc. It’s that they could. Would lose functions like double sace the entire doc.
  • Can share docwit any other Google user or send to someone outside and anyone can see it. Can withdraw sharing link? Has enthropy and Google satisfied. [you can make the doc private though, right?]
  • How often send? At most once a second. Can throttle based on network use?
  • On undo? Now set to new value. Would have to update undo stack

Leave a Reply

Your email address will not be published. Required fields are marked *