data as dna: building a company on data – live blogging at qcon

Data as DNA: Building a Company on Data
Speaker: Cathy Polinsky @cathy_polinsky

See the list of all blog posts from the conference

“90% of the world’s data has been created in the past 2 years” – has been true for a number of years because in the age of data

Key Points

  • If the data is not visible, it is meaningless – need to be able to visualize, understand and interpret the data
  • Does everyone have access to the data – be sure to strip/mask sensitive data or aggregate data (mimimal sample size so not sharing PII data)
  • Determine what data/questions/metrics are important – need to know what optimizing for


  • Showed Amazon example of when they had two many tabs.
  • A/B testing failed because people were used to the original.
  • A/B testing doesn’t help for short term effects.
  • Don’t look at results too early
  • Be explicit about what important to you
  • Iterative testing is important
  • The colser you are to your goal, the more important it is to evaluate your goal metrics
  • Don’t want too. much or too little data – Goldilocks


  • Personalization is not new – even the shopkeeper in a physical store made recommendations
  • Stitch Fix is a personalization company. A personal human stylist curates 5 pieces for you to try on at home. The stylist uses big data and algorithms to help.
  • Tune through feedback.
  • >Most retailers have broken feedback loops. No reason for consumer to share if too expensive, if didn’t fit right, etc. Need compeling self interest to give you data. Best way is if used to make experience even better.
  • Need to show trust nd use data responsibly to get data and get customer to continue to give you data

Hard to predict what computers can’t do. Ex: driving, understanding human speech

Freestyle chess – humans paired with computers being pure computers or pure humans. Use best of human and computer skills. Also had superior technique for how to use computers. Strategy matters!

Humans goo an interpretting query or suggesting things. Review edge cases of output.