- not identified by name; of unknown name
- having no outstanding, individual, or unusual features
(In fact the failure was worse than that - the de-identification algorithm only had 18.4m possibilities for drivers and 1.3m possibilities for vehicles. No human would want to take the time to sift through that lot, but to the computer it's done in the blink of an eye.)
Much fuss was kicked up about care.data recently; many factors contributed here, but one aspect that often receives attention is this aspect of re-identification. I'm sure it is annoying to the authors of the riveting page turner "Anonymisation Standard for Publishing Health and Social Care Data Specification" who have gone to considerable lengths to consider a vast array of possible re-identification attacks, but those still concerned point out that we can't conceive of all of the future data sources that could be correlated to enable re-identification and so the risk is too high to accept.
We've been looking recently for inspiration from some of our related work within Horizon which has looked the predictability in human mobility from GPS traces (open access PDF). The work aims to provide a limit to what could possibly be achieved in terms of mobility prediction no matter how cunning we can be in future. And of course location data is in itself one of the most concerning privacy violating data sources that many people unknowingly continually stream to random third parties from their smart phones.
In any case we need to:
- remind ourselves about the definition of anonymous regularly
- somehow get beyond the unquantified risk argument...