Articles
Data De-identification
De-identification involves the removal of personally identifying information in order to protect personal privacy.In terms of health information, data is considered de-identified under the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule when a number of specified data elements are removed.
What is De-Identified Data?
Data is de-identified when:
- All 18 HIPAA-specific direct and indirect identifiers have been removed (Safe Harbor method)
- Data is determined by expert opinion to have a low probability of re-identification.
Example:
Interviewer: When was the first time you heard about the [Organization]?
Interviewee: Ten years ago. I was a student at [University] and one of my professors told us about [Organization] and their work.
To view a complete example click on the link below. De-identification-Sample.doc29.0KB
What is PHI?
- Individually identifiable health information, including demographic information, that is created or received by a covered entity and that relates to the past, present, or future physical or mental health of an individual, provision of healthcare to an individual, or past, present, or future payment for the provision of healthcare to an individual.
- The presence of at least one of 18 HIPAA-designated direct and indirect identifiers in a data set makes the whole data set Protected Health Information.
- Name
- Social Security numbers
- Telephone numbers
- Addresses and all geographic information smaller than a state
- All elements of dates (except year), including date of birth, admission, discharge, and death; and all ages over 89
- Fax numbers
- E-mail addresses
- Medical record numbers
- Health Plan Beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers, including license plate numbers
- Device identifiers and serial numbers
- Web Universal Resource Locators (URLs)
- Internet Protocol (IP) addresses
- Biometric identifiers, including finger and voice prints
- Full face photographic images and comparable images
- Any other unique identifying number, characteristic, or code: Any code or other means of record identification that is derived from PHI that must be removed in order for the data to be considered de-identified per the Safe Harbor method.
My research involves the use of PHI, what steps do I take?
- Entities covered by HIPAA may share a limited data set for research purposes permitted by the Privacy Rule under data use agreements. All recipients must bound by a data use agreement with the originator of the data.
- If you are a researcher at a non-HIPAA covered entity and request PHI from a HIPAA covered entity for research purposes, then you may require a signed authorization for that use from the patient/participant, or otherwise justify an exception from that requirement.
- In either case, you will be required to have an IRB-approved protocol.
Tips for de-identifying participants
- Plan or apply editing at time of transcription except: longitudinal studies – de-identify when data collection complete
- Avoid blanking out: use pseudonyms or replacements • Avoid over-anonymising: removing / generalising information in text can distort data, make them unusable, unreliable or misleading
- Keep a log of all replacements, aggregations or removals made – keep separate from de-identified data files
- Text anonymisation helper tool can help you find disclosive information to remove or pseudonymise in text files
- MS Word macro to find and highlight numbers and words starting with capital letters in text, which are often disclosive, e.g. names, companies, birth dates, addresses, educational institutions and countries
Regulatory References:
Health Insurance Portability and Accountability Act of 1996 (HIPAA) (Pub. L. No. 104-191, § 264) (1996), codified at 42 U.S.C. § 1320d-2 (2002). Standards for Privacy of Individually Identifiable Health Information, 45 C.F.R. § 160 (2002), 45 C.F.R. § 164 subpts. A, E (2002).
45 CFR 46.102(e)(1), 45 CFR 164.514(b)(2)