Data is a liability
Table of Contents
The only data that can never be leaked is data we don’t have. It should be the basis of a good security program - don’t collect data you don’t need, and purge data as soon as you don’t need it anymore.
These are also two key principles of the GDPR - data minimisation and storage limitation. Sure, the GDPR only applies to personal data, but these principles are universal.
Is data an asset, or a liability? #
It may be possible to value fresh data, but is there anything to monetize by knowing a user’s favorite brand of butter, five years ago, or their complete shopping history, from ten years ago? Trends are interesting, but do they require to keep such a detailed level of data?
Is there any reason to keep user accounts active, when they haven’t logged-in in the last five years, and actually only used our product once? Think of the 2013 MySpace hack and of how many people received a breach notification while not even remembering what the service was for!
And finally, what about websites that don’t provide a way to delete an account - or worse, when their “delete” feature actually just disables or hide data - sorry, “mark data for deletion” but actually keep it virtually forever?
Marketing requires aggregated data? Aggregate! #
Keeping detailed, personally identifiable information is not required to fulfill Business Intelligence goals and calculate metrics. Anonymizing old data (closed accounts for instance) while keeping required KPIs intact is possible:
Instead of storing… | We may only need… |
---|---|
e-mail addresses | a numeric identifier; at most the domain if needed |
full birthdate | the year, or the date by a +/- 180 days approximation, so we can still calculate users’ average age |
member-to-member chats | the average amount of messages sent/received per week |
full name | nothing! |
credit card data | the credit card brand |
The older data gets, the lower value it has. Data older than X years may even be anonymized further.
Also, Analytics teams usually think they “may at some point” calculate “something” requiring data, and don’t want to get rid of old data. If the above-mentioned anonymization scheme doesn’t get approved, try the following: extract data from your Information System, zip/encrypt it, export it somewhere, keep the key offline, and make sure BI can have it back once they need it. Truth is, they rarely come back.
And don’t forget employee data #
What would happen if all internal e-mails were published tomorrow, as it happened some years ago to Sony Pictures?
Employees may feel uncomfortable if valuable e-mails disappeared after a hard deadline - after all, valuable attachments may get lost. But they probably don’t realize that informal chats may be kept forever on servers (think Microsoft Teams / Skype for Business, Slack, etc.). These are easier to purge after as low as 14 days with the users’ full support.