From the course: Data Steward Foundations

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Data anonymization

Data anonymization

- [Instructor] One way that many organizations seek to protect themselves against accidental disclosures of personal information is to remove all identifying information from data sets before placing them in the cloud or with another service provider. De-identification is the process of moving through a data set and removing data that may be individually identifying. For example, you would certainly want to remove names, social security numbers, and other obvious identifiers. However, simple data de-identification is often insufficient to completely safeguard information. The reason for this is that you can often combine seemingly innocuous fields to uniquely identify an individual. A study done at Carnegie Mellon University analyzed three fields commonly retained in de-identified data sets, zip codes, dates of birth, and gender. Now, you wouldn't think any one of these fields used alone would allow you to identify someone. After all, a lot of people live in the same town as me and…

Contents