QCon 2018 – Privacy Ethics – A Big Data Problem

Title: Privacy Ethics – A Big Data Problem
Speaker: Raghu Gollamudi

See the table of contents for more blog posts from the conference.


GPDR (General Data Protection Regulation) – took effect May 25, 2018

Data is exploding

  • Cost of storing data so low that it is essentially free
  • 250 petabytes of data a month. What comes ater petabytes?
  • Getting more data when acquire other companies
  • IOT data is ending up in massive data lakes

Sensitive information – varies by domain

  • Usernames
  • user base – customers could be sensitive for a law firm
  • location – the issue with a fitness tracker identifing location of a military base
  • purchases – disclosing someone is pregnant before they tell people
  • employee data

changes over time – collecting more data after decision made to log

Privacy vs security

  • privacy – individual right, focus on how data used, depends on context
  • security – protect information, focus on confidentiality/accessibility, explicit controls
  • privacy is an under invested market. Security is more mature [but still an issue]

Solutions

  • culture
  • invest more – GDPR fines orders of magniude higher than privacy budget
  • include in perormance reviews
  • barrier to entry – must do at least what Facebook does if in that space
  • security – encrypt, Anonymization/pseudonyization, audit logs, store credentials in vault
  • reuse – use solutions available to you
  • design for data integrity, authorization, conservative approach to privacy settings
  • include privacy related tasks in sprint
  • design in data retention – how long do you need it for
  • automation – label data (tag/classify/confidence score)   So can automate compliance. Score helps reduce false positives

EU currently strictest privacy policy  Germany and Brazil working on. There was a debate on whether it applies to EU citizens or residents. Mostly agreement that physical location matters

My take

I was expectng this to be more technical. There was a little about the implications of big data like automation. But it felt glossed over. I would have liked to see an example of some technique that involves big data. The session was fine. It covered a lot of areas in passing which is a good opening session – lets you know where to plan. I think not having the “what you will learn” session on the abstract made it harder to know what to expect. Maybe QCon should make this mandatory?